Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 1/2] Convert target drivers to use sbitmap
From: Matthew Wilcox @ 2018-06-12 16:15 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-usb@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	kent.overstreet@gmail.com, linux1394-devel@lists.sourceforge.net,
	jgross@suse.com, axboe@kernel.dk, linux-scsi@vger.kernel.org,
	qla2xxx-upstream@qlogic.com, target-devel@vger.kernel.org,
	netdev@vger.kernel.org, mawilcox@microsoft.com
In-Reply-To: <da5220a5ed4bed210c31a7517389e787a3b1a01f.camel@wdc.com>

On Tue, Jun 12, 2018 at 03:22:42PM +0000, Bart Van Assche wrote:
> On Tue, 2018-05-15 at 09:00 -0700, Matthew Wilcox wrote:
> > diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c
> > index 025dc2d3f3de..cdf671c2af61 100644
> > --- a/drivers/scsi/qla2xxx/qla_target.c
> > +++ b/drivers/scsi/qla2xxx/qla_target.c
> > @@ -3719,7 +3719,8 @@ void qlt_free_cmd(struct qla_tgt_cmd *cmd)
> >  		return;
> >  	}
> >  	cmd->jiffies_at_free = get_jiffies_64();
> > -	percpu_ida_free(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag);
> > +	sbitmap_queue_clear(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag,
> > +			cmd->se_cmd.map_cpu);
> >  }
> >  EXPORT_SYMBOL(qlt_free_cmd);
> 
> Please introduce functions in the target core for allocating and freeing a tag
> instead of spreading the knowledge of how to allocate and free tags over all
> target drivers.

I can't without doing an unreasonably large amount of work on drivers that
I have no way to test.  Some of the drivers have the se_cmd already; some
of them don't.  I'd be happy to introduce a common function for freeing
a tag.

> > +int iscsit_wait_for_tag(struct se_session *se_sess, int state, int *cpup)
> > +{
> > +	int tag = -1;
> > +	DEFINE_WAIT(wait);
> > +	struct sbq_wait_state *ws;
> > +
> > +	if (state == TASK_RUNNING)
> > +		return tag;
> > +
> > +	ws = &se_sess->sess_tag_pool.ws[0];
> > +	for (;;) {
> > +		prepare_to_wait_exclusive(&ws->wait, &wait, state);
> > +		if (signal_pending_state(state, current))
> > +			break;
> 
> This looks weird to me. Shouldn't target code ignore signals instead of causing
> tag allocation to fail if a signal is received?

It's what the current code did:

-               if (signal_pending_state(state, current)) {
-                       tag = -ERESTARTSYS;
-                       break;
-               }

and the current callers literally indicate that they want signals:

drivers/infiniband/ulp/isert/ib_isert.c:        cmd = iscsit_allocate_cmd(conn, TASK_INTERRUPTIBLE);
drivers/target/iscsi/iscsi_target.c:    cmd = iscsit_allocate_cmd(conn, TASK_INTERRUPTIBLE);

(etc)

^ permalink raw reply

* Re: [PATCH net 1/2] ipv4: igmp: use alarmtimer to prevent delayed reports
From: Andrew Lunn @ 2018-06-12 16:28 UTC (permalink / raw)
  To: Tejaswi Tanikella; +Cc: netdev, f.fainelli, davem
In-Reply-To: <20180611115058.GA12452@tejaswit-linux.qualcomm.com>

On Mon, Jun 11, 2018 at 05:21:05PM +0530, Tejaswi Tanikella wrote:
> On receiving a IGMPv2/v3 query, based on max_delay set in the header a
> timer is started to send out a response after a random time within
> max_delay. If the system then moves into suspend state, Report is
> delayed until system wakes up.
> 
> Use a alarmtimer instead of using a timer. Alarmtimer will wake the
> system up from suspend to send out the IGMP report.

Hi Tejaswi

I think i must be missing something here. If we are suspended, we are
not receiving multicast frames. If we are not receiving frames, why do
we need to reply to the query?

Once we resume, i expect we will reply to the next query. You could
optimise restarting the flow by immediately sending a membership
report, same as when the setsockopt is used to join the group.

	Andrew

^ permalink raw reply

* Re: [PATCH 2/2] ktime: helpers to convert between ktime and jiffies
From: Andrew Lunn @ 2018-06-12 16:30 UTC (permalink / raw)
  To: Tejaswi Tanikella; +Cc: netdev, f.fainelli, davem
In-Reply-To: <20180611115218.GA23539@tejaswit-linux.qualcomm.com>

On Mon, Jun 11, 2018 at 05:22:28PM +0530, Tejaswi Tanikella wrote:
> Signed-off-by: Tejaswi Tanikella <tejaswit@codeaurora.org>
> ---
>  include/linux/ktime.h | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/include/linux/ktime.h b/include/linux/ktime.h
> index 5b9fddb..4881483 100644
> --- a/include/linux/ktime.h
> +++ b/include/linux/ktime.h
> @@ -96,6 +96,10 @@ static inline ktime_t timeval_to_ktime(struct timeval tv)
>  /* Convert ktime_t to nanoseconds - NOP in the scalar storage format: */
>  #define ktime_to_ns(kt)			(kt)
>  
> +/* ktime to jiffies and back */
> +#define ktime_to_jiffies(kt)		nsecs_to_jiffies(kt)
> +#define jiffies_to_ktime(j)		jiffies_to_nsecs(j)

Hi Tejaswi

You should also add some users of these new helpers.

    Andrew

^ permalink raw reply

* Re: [PATCH 1/2] Convert target drivers to use sbitmap
From: Bart Van Assche @ 2018-06-12 16:32 UTC (permalink / raw)
  To: willy@infradead.org
  Cc: jgross@suse.com, axboe@kernel.dk, linux-scsi@vger.kernel.org,
	kvm@vger.kernel.org, mawilcox@microsoft.com,
	netdev@vger.kernel.org, linux-usb@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	target-devel@vger.kernel.org, qla2xxx-upstream@qlogic.com,
	linux1394-devel@lists.sourceforge.net, kent.overstreet@gmail.com
In-Reply-To: <20180612161526.GE19433@bombadil.infradead.org>

On Tue, 2018-06-12 at 09:15 -0700, Matthew Wilcox wrote:
> On Tue, Jun 12, 2018 at 03:22:42PM +0000, Bart Van Assche wrote:
> > On Tue, 2018-05-15 at 09:00 -0700, Matthew Wilcox wrote:
> > > diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c
> > > index 025dc2d3f3de..cdf671c2af61 100644
> > > --- a/drivers/scsi/qla2xxx/qla_target.c
> > > +++ b/drivers/scsi/qla2xxx/qla_target.c
> > > @@ -3719,7 +3719,8 @@ void qlt_free_cmd(struct qla_tgt_cmd *cmd)
> > >  		return;
> > >  	}
> > >  	cmd->jiffies_at_free = get_jiffies_64();
> > > -	percpu_ida_free(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag);
> > > +	sbitmap_queue_clear(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag,
> > > +			cmd->se_cmd.map_cpu);
> > >  }
> > >  EXPORT_SYMBOL(qlt_free_cmd);
> > 
> > Please introduce functions in the target core for allocating and freeing a tag
> > instead of spreading the knowledge of how to allocate and free tags over all
> > target drivers.
> 
> I can't without doing an unreasonably large amount of work on drivers that
> I have no way to test.  Some of the drivers have the se_cmd already; some
> of them don't.  I'd be happy to introduce a common function for freeing
> a tag.

Which target drivers are you referring to? If you are referring to the sbp driver:
I think that driver is dead and can be removed from the kernel tree. I even don't
know whether that driver ever has had any users other than the developer of that
driver.

> > This looks weird to me. Shouldn't target code ignore signals instead of causing
> > tag allocation to fail if a signal is received?
> 
> It's what the current code did:
> 
> -               if (signal_pending_state(state, current)) {
> -                       tag = -ERESTARTSYS;
> -                       break;
> -               }
> 
> and the current callers literally indicate that they want signals:
> 
> drivers/infiniband/ulp/isert/ib_isert.c:        cmd = iscsit_allocate_cmd(conn, TASK_INTERRUPTIBLE);
> drivers/target/iscsi/iscsi_target.c:    cmd = iscsit_allocate_cmd(conn, TASK_INTERRUPTIBLE);

Right, the iSCSI target driver uses signals to wake up threads (see also the
send_sig() calls in the iSCSI target code).

Bart.

^ permalink raw reply

* Re: [PATCH] net: stmmac: fix build failure due to missing COMMON_CLK dependency
From: Andy Shevchenko @ 2018-06-12 16:35 UTC (permalink / raw)
  To: David Miller, Geert Uytterhoeven
  Cc: Corentin Labbe, Alexandre TORGUE, Giuseppe CAVALLARO,
	Linux Kernel Mailing List, netdev, linux-sunxi
In-Reply-To: <20180608.105926.600207780816212953.davem@davemloft.net>

On Fri, Jun 8, 2018 at 5:59 PM, David Miller <davem@davemloft.net> wrote:
> From: Corentin Labbe <clabbe@baylibre.com>
> Date: Wed,  6 Jun 2018 18:45:22 +0000
>
>> This patch fix the build failure on m68k;
>> drivers/net/ethernet/stmicro/stmmac/dwmac-ipq806x.o: In function `ipq806x_gmac_probe':
>> dwmac-ipq806x.c:(.text+0xda): undefined reference to `clk_set_rate'
>> drivers/net/ethernet/stmicro/stmmac/dwmac-rk.o: In function `rk_gmac_probe':
>> dwmac-rk.c:(.text+0x1e58): undefined reference to `clk_set_rate'
>> drivers/net/ethernet/stmicro/stmmac/dwmac-sti.o: In function `stid127_fix_retime_src':
>> dwmac-sti.c:(.text+0xd8): undefined reference to `clk_set_rate'
>> dwmac-sti.c:(.text+0x114): undefined reference to `clk_set_rate'
>> drivers/net/ethernet/stmicro/stmmac/dwmac-sti.o:dwmac-sti.c:(.text+0x12c): more undefined references to `clk_set_rate' follow
>> Lots of stmmac platform drivers need COMMON_CLK in their Kconfig depends.
>>
>> Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
>
> Applied.

I think Geert has a better fix https://lkml.org/lkml/2018/6/11/122

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply

* Re: [PATCH net-next 3/6] net: ethernet: ti: cpsw: add MQPRIO Qdisc offload
From: Andrew Lunn @ 2018-06-12 16:36 UTC (permalink / raw)
  To: Ivan Khoronzhuk
  Cc: grygorii.strashko, davem, corbet, akpm, netdev, linux-doc,
	linux-kernel, linux-omap, vinicius.gomes, henrik,
	jesus.sanchez-palencia, ilias.apalodimas, p-varis, spatton,
	francois.ozog, yogeshs, nsekhar
In-Reply-To: <20180611133047.4818-4-ivan.khoronzhuk@linaro.org>

On Mon, Jun 11, 2018 at 04:30:44PM +0300, Ivan Khoronzhuk wrote:
> That's possible to offload vlan to tc priority mapping with
> assumption sk_prio == L2 prio.
> 
> Example:
> $ ethtool -L eth0 rx 1 tx 4
> 
> $ qdisc replace dev eth0 handle 100: parent root mqprio num_tc 3 \
> map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 1
> 
> $ tc -g class show dev eth0
> +---(100:ffe2) mqprio
> |    +---(100:3) mqprio
> |    +---(100:4) mqprio
> |    
> +---(100:ffe1) mqprio
> |    +---(100:2) mqprio
> |    
> +---(100:ffe0) mqprio
>      +---(100:1) mqprio
> 
> Here, 100:1 is txq0, 100:2 is txq1, 100:3 is txq2, 100:4 is txq3
> txq0 belongs to tc0, txq1 to tc1, txq2 and txq3 to tc2
> The offload part only maps L2 prio to classes of traffic, but not
> to transmit queues, so to direct traffic to traffic class vlan has
> to be created with appropriate egress map.
> 
> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
> ---
>  drivers/net/ethernet/ti/cpsw.c | 82 ++++++++++++++++++++++++++++++++++
>  1 file changed, 82 insertions(+)
> 
> diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
> index 406537d74ec1..fd967d2bce5d 100644
> --- a/drivers/net/ethernet/ti/cpsw.c
> +++ b/drivers/net/ethernet/ti/cpsw.c
> @@ -39,6 +39,7 @@
>  #include <linux/sys_soc.h>
>  
>  #include <linux/pinctrl/consumer.h>
> +#include <net/pkt_cls.h>
>  
>  #include "cpsw.h"
>  #include "cpsw_ale.h"
> @@ -153,6 +154,8 @@ do {								\
>  #define IRQ_NUM			2
>  #define CPSW_MAX_QUEUES		8
>  #define CPSW_CPDMA_DESCS_POOL_SIZE_DEFAULT 256
> +#define CPSW_TC_NUM			4
> +#define CPSW_FIFO_SHAPERS_NUM		(CPSW_TC_NUM - 1)
>  
>  #define CPSW_RX_VLAN_ENCAP_HDR_PRIO_SHIFT	29
>  #define CPSW_RX_VLAN_ENCAP_HDR_PRIO_MSK		GENMASK(2, 0)
> @@ -453,6 +456,7 @@ struct cpsw_priv {
>  	u8				mac_addr[ETH_ALEN];
>  	bool				rx_pause;
>  	bool				tx_pause;
> +	bool				mqprio_hw;
>  	u32 emac_port;
>  	struct cpsw_common *cpsw;
>  };
> @@ -1577,6 +1581,14 @@ static void cpsw_slave_stop(struct cpsw_slave *slave, struct cpsw_common *cpsw)
>  	soft_reset_slave(slave);
>  }
>  
> +static int cpsw_tc_to_fifo(int tc, int num_tc)
> +{
> +	if (tc == num_tc - 1)
> +		return 0;
> +
> +	return CPSW_FIFO_SHAPERS_NUM - tc;
> +}
> +
>  static int cpsw_ndo_open(struct net_device *ndev)
>  {
>  	struct cpsw_priv *priv = netdev_priv(ndev);
> @@ -2190,6 +2202,75 @@ static int cpsw_ndo_set_tx_maxrate(struct net_device *ndev, int queue, u32 rate)
>  	return ret;
>  }
>  
> +static int cpsw_set_tc(struct net_device *ndev, void *type_data)
> +{

Hi Ivan

Maybe this is not the best of names. What if you add support for
another TC qdisc? So you have another case in the switch statement
below?

Maybe call it cpsw_set_mqprio?

> +static int cpsw_ndo_setup_tc(struct net_device *ndev, enum tc_setup_type type,
> +			     void *type_data)
> +{
> +	switch (type) {
> +	case TC_SETUP_QDISC_MQPRIO:
> +		return cpsw_set_tc(ndev, type_data);
> +
> +	default:
> +		return -EOPNOTSUPP;
> +	}
> +}

  Andrew
  

^ permalink raw reply

* Re: problems with SCTP GSO
From: Marcelo Ricardo Leitner @ 2018-06-12 17:05 UTC (permalink / raw)
  To: David Miller; +Cc: lucien.xin, edumazet, netdev
In-Reply-To: <20180611.202905.1954825345357429286.davem@davemloft.net>

On Mon, Jun 11, 2018 at 08:29:05PM -0700, David Miller wrote:
> 
> I would like to bring up some problems with the current GSO
> implementation in SCTP.
> 
> The most important for me right now is that SCTP uses
> "skb_gro_receive()" to build "GSO" frames :-(
> 
> Really it just ends up using the slow path (basically, label 'merge'
> and onwards).
> 
> So, using a GRO helper to build GSO packets is not great.

Okay.

> 
> I want to make major surgery here and the only way I can is if
> it is exactly the GRO demuxing path that uses skb_gro_receive().
> 
> Those paths pass in the list head from the NAPI struct that initiated
> the GRO code paths.  That makes it easy for me to change this to use a
> list_head or a hash chain.
> 
> Probably in the short term SCTP should just have a private helper that
> builds the frag list, appending 'skb' to 'head'.
> 
> In the long term, SCTP should use the page frags just like TCP to
> append the data when building GSO frames.  Then it could actually be
> offloaded and passed into drivers without linearizing.

Sounds like a plan. Shouldn't be too hard to do it.
(I'm out on PTO, btw)

Thanks,
Marcelo

^ permalink raw reply

* [PATCH] Revert "net: do not allow changing SO_REUSEADDR/SO_REUSEPORT on bound sockets"
From: Bart Van Assche @ 2018-06-12 17:05 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Bart Van Assche, Maciej Żenczykowski, Eric Dumazet

Revert the patch mentioned in the subject because it breaks at least
the Avahi mDNS daemon. That patch namely causes the Ubuntu 18.04 Avahi
daemon to fail to start:

Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: Successfully called chroot().
Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: Successfully dropped remaining capabilities.
Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: No service file found in /etc/avahi/services.
Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: SO_REUSEADDR failed: Structure needs cleaning
Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: SO_REUSEADDR failed: Structure needs cleaning
Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: Failed to create server: No suitable network protocol available
Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: avahi-daemon 0.7 exiting.
Jun 12 09:49:24 ubuntu-vm systemd[1]: avahi-daemon.service: Main process exited, code=exited, status=255/n/a
Jun 12 09:49:24 ubuntu-vm systemd[1]: avahi-daemon.service: Failed with result 'exit-code'.
Jun 12 09:49:24 ubuntu-vm systemd[1]: Failed to start Avahi mDNS/DNS-SD Stack.

Fixes: f396922d862a ("net: do not allow changing SO_REUSEADDR/SO_REUSEPORT on bound sockets")
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
---
 net/core/sock.c | 15 +--------------
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index f333d75ef1a9..bcc41829a16d 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -728,22 +728,9 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
 			sock_valbool_flag(sk, SOCK_DBG, valbool);
 		break;
 	case SO_REUSEADDR:
-		val = (valbool ? SK_CAN_REUSE : SK_NO_REUSE);
-		if ((sk->sk_family == PF_INET || sk->sk_family == PF_INET6) &&
-		    inet_sk(sk)->inet_num &&
-		    (sk->sk_reuse != val)) {
-			ret = (sk->sk_state == TCP_ESTABLISHED) ? -EISCONN : -EUCLEAN;
-			break;
-		}
-		sk->sk_reuse = val;
+		sk->sk_reuse = (valbool ? SK_CAN_REUSE : SK_NO_REUSE);
 		break;
 	case SO_REUSEPORT:
-		if ((sk->sk_family == PF_INET || sk->sk_family == PF_INET6) &&
-		    inet_sk(sk)->inet_num &&
-		    (sk->sk_reuseport != valbool)) {
-			ret = (sk->sk_state == TCP_ESTABLISHED) ? -EISCONN : -EUCLEAN;
-			break;
-		}
 		sk->sk_reuseport = valbool;
 		break;
 	case SO_TYPE:
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH] Revert "net: do not allow changing SO_REUSEADDR/SO_REUSEPORT on bound sockets"
From: Eric Dumazet @ 2018-06-12 17:13 UTC (permalink / raw)
  To: Bart Van Assche, David S . Miller
  Cc: netdev, Maciej Żenczykowski, Eric Dumazet
In-Reply-To: <20180612170555.11733-1-bart.vanassche@wdc.com>



On 06/12/2018 10:05 AM, Bart Van Assche wrote:
> Revert the patch mentioned in the subject because it breaks at least
> the Avahi mDNS daemon. That patch namely causes the Ubuntu 18.04 Avahi
> daemon to fail to start:
> 
> Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: Successfully called chroot().
> Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: Successfully dropped remaining capabilities.
> Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: No service file found in /etc/avahi/services.
> Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: SO_REUSEADDR failed: Structure needs cleaning
> Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: SO_REUSEADDR failed: Structure needs cleaning
> Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: Failed to create server: No suitable network protocol available
> Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: avahi-daemon 0.7 exiting.
> Jun 12 09:49:24 ubuntu-vm systemd[1]: avahi-daemon.service: Main process exited, code=exited, status=255/n/a
> Jun 12 09:49:24 ubuntu-vm systemd[1]: avahi-daemon.service: Failed with result 'exit-code'.
> Jun 12 09:49:24 ubuntu-vm systemd[1]: Failed to start Avahi mDNS/DNS-SD Stack.
> 
> Fixes: f396922d862a ("net: do not allow changing SO_REUSEADDR/SO_REUSEPORT on bound sockets")
> Cc: Maciej Żenczykowski <maze@google.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
> ---
>  net/core/sock.c | 15 +--------------
>  1 file changed, 1 insertion(+), 14 deletions(-)

Yes, this change probably broke a lot of applications, unfortunately.

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: problems with SCTP GSO
From: Marcelo Ricardo Leitner @ 2018-06-12 17:30 UTC (permalink / raw)
  To: David Miller; +Cc: lucien.xin, edumazet, netdev
In-Reply-To: <20180612170506.GF3877@localhost.localdomain>

On Tue, Jun 12, 2018 at 02:05:06PM -0300, Marcelo Ricardo Leitner wrote:
> On Mon, Jun 11, 2018 at 08:29:05PM -0700, David Miller wrote:
> > 
> > I would like to bring up some problems with the current GSO
> > implementation in SCTP.
> > 
> > The most important for me right now is that SCTP uses
> > "skb_gro_receive()" to build "GSO" frames :-(
> > 
> > Really it just ends up using the slow path (basically, label 'merge'
> > and onwards).
> > 
> > So, using a GRO helper to build GSO packets is not great.
> 
> Okay.
> 
> > 
> > I want to make major surgery here and the only way I can is if
> > it is exactly the GRO demuxing path that uses skb_gro_receive().
> > 
> > Those paths pass in the list head from the NAPI struct that initiated
> > the GRO code paths.  That makes it easy for me to change this to use a
> > list_head or a hash chain.
> > 
> > Probably in the short term SCTP should just have a private helper that
> > builds the frag list, appending 'skb' to 'head'.
> > 
> > In the long term, SCTP should use the page frags just like TCP to
> > append the data when building GSO frames.  Then it could actually be
> > offloaded and passed into drivers without linearizing.
> 
> Sounds like a plan. Shouldn't be too hard to do it.
> (I'm out on PTO, btw)

Xin will work on this, mean while at least. Thanks Xin.

> 
> Thanks,
> Marcelo
> 

^ permalink raw reply

* Fw: [Bug 200033] New: stack-out-of-bounds in __xfrm_dst_hash net/xfrm/xfrm_hash.h
From: Stephen Hemminger @ 2018-06-12 17:38 UTC (permalink / raw)
  To: netdev



Begin forwarded message:

Date: Tue, 12 Jun 2018 01:44:36 +0000
From: bugzilla-daemon@bugzilla.kernel.org
To: stephen@networkplumber.org
Subject: [Bug 200033] New: stack-out-of-bounds in __xfrm_dst_hash net/xfrm/xfrm_hash.h


https://bugzilla.kernel.org/show_bug.cgi?id=200033

            Bug ID: 200033
           Summary: stack-out-of-bounds in __xfrm_dst_hash
                    net/xfrm/xfrm_hash.h
           Product: Networking
           Version: 2.5
    Kernel Version: v4.17
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: Other
          Assignee: stephen@networkplumber.org
          Reporter: icytxw@gmail.com
        Regression: No

Created attachment 276483
  --> https://bugzilla.kernel.org/attachment.cgi?id=276483&action=edit  
Found this bug with modified syzkaller

==================================================================
BUG: KASAN: stack-out-of-bounds in __xfrm_dst_hash net/xfrm/xfrm_hash.h:96
[inline]
BUG: KASAN: stack-out-of-bounds in xfrm_dst_hash net/xfrm/xfrm_state.c:61
[inline]
BUG: KASAN: stack-out-of-bounds in xfrm_state_find+0x24ab/0x26e0
net/xfrm/xfrm_state.c:953
Read of size 4 at addr ffff880054b17b70 by task syz-executor0/13697

CPU: 0 PID: 13697 Comm: syz-executor0 Not tainted 4.17.0 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1
04/01/2014
Call Trace:

The buggy address belongs to the page:
page:ffffea000152c5c0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0x100000000000000()
raw: 0100000000000000 0000000000000000 ffffea000152c5c8 0000000000000000
raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff880054b17a00: 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f2 f2 f2 f2
 ffff880054b17a80: f2 f2 f2 00 00 00 00 f2 f2 f2 f2 00 00 00 00 00
>ffff880054b17b00: f2 f2 f2 f2 f2 f2 f2 00 00 00 00 00 00 00 f2 f2  
                                                             ^
 ffff880054b17b80: f2 f2 f2 00 00 00 00 00 00 00 00 00 f2 f2 f2 f3
 ffff880054b17c00: f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
Kernel panic - not syncing: panic_on_warn set ...

CPU: 0 PID: 13697 Comm: syz-executor0 Tainted: G    B             4.17.0 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1
04/01/2014
Call Trace:
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply

* Re: [jkirsher/next-queue PATCH v2 2/7] net: Add support for subordinate device traffic classes
From: Florian Fainelli @ 2018-06-12 17:49 UTC (permalink / raw)
  To: Alexander Duyck, intel-wired-lan, jeffrey.t.kirsher, netdev
In-Reply-To: <20180612151835.86792.93718.stgit@ahduyck-green-test.jf.intel.com>

On 06/12/2018 08:18 AM, Alexander Duyck wrote:
> This patch is meant to provide the basic tools needed to allow us to create
> subordinate device traffic classes. The general idea here is to allow
> subdividing the queues of a device into queue groups accessible through an
> upper device such as a macvlan.
> 
> The idea here is to enforce the idea that an upper device has to be a
> single queue device, ideally with IFF_NO_QUQUE set. With that being the
> case we can pretty much guarantee that the tc_to_txq mappings and XPS maps
> for the upper device are unused. As such we could reuse those in order to
> support subdividing the lower device and distributing those queues between
> the subordinate devices.

This is not necessarily a valid paradigm to work with. For instance in
DSA we have IFF_NO_QUEUE devices, but we still expose multiple egress
queues because that is how an application can choose how it wants to get
packets transmitted at the switch level. We have a 1:1 representation
between a queue at the net_device level, and what an egress queue at the
switch level is, so things like buffer reservation etc. can be configured.

I think you should consider that an upper device might want to have a
1:1 mapping to the lower device's queues and make that permissible.
Thoughts?

> 
> In order to distinguish between a regular set of traffic classes and if a
> device is carrying subordinate traffic classes I changed num_tc from a u8
> to a s16 value and use the negative values to represent the suboordinate
> pool values. So starting at -1 and running to -32768 we can encode those as
> pool values, and the existing values of 0 to 15 can be maintained.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> ---
>  include/linux/netdevice.h |   16 ++++++++
>  net/core/dev.c            |   89 +++++++++++++++++++++++++++++++++++++++++++++
>  net/core/net-sysfs.c      |   21 ++++++++++-
>  3 files changed, 124 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 3ec9850..41b4660 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -569,6 +569,9 @@ struct netdev_queue {
>  	 * (/sys/class/net/DEV/Q/trans_timeout)
>  	 */
>  	unsigned long		trans_timeout;
> +
> +	/* Suboordinate device that the queue has been assigned to */
> +	struct net_device	*sb_dev;
>  /*
>   * write-mostly part
>   */
> @@ -1978,7 +1981,7 @@ struct net_device {
>  #ifdef CONFIG_DCB
>  	const struct dcbnl_rtnl_ops *dcbnl_ops;
>  #endif
> -	u8			num_tc;
> +	s16			num_tc;
>  	struct netdev_tc_txq	tc_to_txq[TC_MAX_QUEUE];
>  	u8			prio_tc_map[TC_BITMASK + 1];
>  
> @@ -2032,6 +2035,17 @@ int netdev_get_num_tc(struct net_device *dev)
>  	return dev->num_tc;
>  }
>  
> +void netdev_unbind_sb_channel(struct net_device *dev,
> +			      struct net_device *sb_dev);
> +int netdev_bind_sb_channel_queue(struct net_device *dev,
> +				 struct net_device *sb_dev,
> +				 u8 tc, u16 count, u16 offset);
> +int netdev_set_sb_channel(struct net_device *dev, u16 channel);
> +static inline int netdev_get_sb_channel(struct net_device *dev)
> +{
> +	return max_t(int, -dev->num_tc, 0);
> +}
> +
>  static inline
>  struct netdev_queue *netdev_get_tx_queue(const struct net_device *dev,
>  					 unsigned int index)
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 6e18242..27fe4f2 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2068,11 +2068,13 @@ int netdev_txq_to_tc(struct net_device *dev, unsigned int txq)
>  		struct netdev_tc_txq *tc = &dev->tc_to_txq[0];
>  		int i;
>  
> +		/* walk through the TCs and see if it falls into any of them */
>  		for (i = 0; i < TC_MAX_QUEUE; i++, tc++) {
>  			if ((txq - tc->offset) < tc->count)
>  				return i;
>  		}
>  
> +		/* didn't find it, just return -1 to indicate no match */
>  		return -1;
>  	}
>  
> @@ -2215,7 +2217,14 @@ int netif_set_xps_queue(struct net_device *dev, const struct cpumask *mask,
>  	bool active = false;
>  
>  	if (dev->num_tc) {
> +		/* Do not allow XPS on subordinate device directly */
>  		num_tc = dev->num_tc;
> +		if (num_tc < 0)
> +			return -EINVAL;
> +
> +		/* If queue belongs to subordinate dev use its map */
> +		dev = netdev_get_tx_queue(dev, index)->sb_dev ? : dev;
> +
>  		tc = netdev_txq_to_tc(dev, index);
>  		if (tc < 0)
>  			return -EINVAL;
> @@ -2366,11 +2375,25 @@ int netif_set_xps_queue(struct net_device *dev, const struct cpumask *mask,
>  EXPORT_SYMBOL(netif_set_xps_queue);
>  
>  #endif
> +static void netdev_unbind_all_sb_channels(struct net_device *dev)
> +{
> +	struct netdev_queue *txq = &dev->_tx[dev->num_tx_queues];
> +
> +	/* Unbind any subordinate channels */
> +	while (txq-- != &dev->_tx[0]) {
> +		if (txq->sb_dev)
> +			netdev_unbind_sb_channel(dev, txq->sb_dev);
> +	}
> +}
> +
>  void netdev_reset_tc(struct net_device *dev)
>  {
>  #ifdef CONFIG_XPS
>  	netif_reset_xps_queues_gt(dev, 0);
>  #endif
> +	netdev_unbind_all_sb_channels(dev);
> +
> +	/* Reset TC configuration of device */
>  	dev->num_tc = 0;
>  	memset(dev->tc_to_txq, 0, sizeof(dev->tc_to_txq));
>  	memset(dev->prio_tc_map, 0, sizeof(dev->prio_tc_map));
> @@ -2399,11 +2422,77 @@ int netdev_set_num_tc(struct net_device *dev, u8 num_tc)
>  #ifdef CONFIG_XPS
>  	netif_reset_xps_queues_gt(dev, 0);
>  #endif
> +	netdev_unbind_all_sb_channels(dev);
> +
>  	dev->num_tc = num_tc;
>  	return 0;
>  }
>  EXPORT_SYMBOL(netdev_set_num_tc);
>  
> +void netdev_unbind_sb_channel(struct net_device *dev,
> +			      struct net_device *sb_dev)
> +{
> +	struct netdev_queue *txq = &dev->_tx[dev->num_tx_queues];
> +
> +#ifdef CONFIG_XPS
> +	netif_reset_xps_queues_gt(sb_dev, 0);
> +#endif
> +	memset(sb_dev->tc_to_txq, 0, sizeof(sb_dev->tc_to_txq));
> +	memset(sb_dev->prio_tc_map, 0, sizeof(sb_dev->prio_tc_map));
> +
> +	while (txq-- != &dev->_tx[0]) {
> +		if (txq->sb_dev == sb_dev)
> +			txq->sb_dev = NULL;
> +	}
> +}
> +EXPORT_SYMBOL(netdev_unbind_sb_channel);
> +
> +int netdev_bind_sb_channel_queue(struct net_device *dev,
> +				 struct net_device *sb_dev,
> +				 u8 tc, u16 count, u16 offset)
> +{
> +	/* Make certain the sb_dev and dev are already configured */
> +	if (sb_dev->num_tc >= 0 || tc >= dev->num_tc)
> +		return -EINVAL;
> +
> +	/* We cannot hand out queues we don't have */
> +	if ((offset + count) > dev->real_num_tx_queues)
> +		return -EINVAL;
> +
> +	/* Record the mapping */
> +	sb_dev->tc_to_txq[tc].count = count;
> +	sb_dev->tc_to_txq[tc].offset = offset;
> +
> +	/* Provide a way for Tx queue to find the tc_to_txq map or
> +	 * XPS map for itself.
> +	 */
> +	while (count--)
> +		netdev_get_tx_queue(dev, count + offset)->sb_dev = sb_dev;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(netdev_bind_sb_channel_queue);
> +
> +int netdev_set_sb_channel(struct net_device *dev, u16 channel)
> +{
> +	/* Do not use a multiqueue device to represent a subordinate channel */
> +	if (netif_is_multiqueue(dev))
> +		return -ENODEV;
> +
> +	/* We allow channels 1 - 32767 to be used for subordinate channels.
> +	 * Channel 0 is meant to be "native" mode and used only to represent
> +	 * the main root device. We allow writing 0 to reset the device back
> +	 * to normal mode after being used as a subordinate channel.
> +	 */
> +	if (channel > S16_MAX)
> +		return -EINVAL;
> +
> +	dev->num_tc = -channel;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(netdev_set_sb_channel);
> +
>  /*
>   * Routine to help set real_num_tx_queues. To avoid skbs mapped to queues
>   * greater than real_num_tx_queues stale skbs on the qdisc must be flushed.
> diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
> index 335c6a4..bd067b1 100644
> --- a/net/core/net-sysfs.c
> +++ b/net/core/net-sysfs.c
> @@ -1054,11 +1054,23 @@ static ssize_t traffic_class_show(struct netdev_queue *queue,
>  		return -ENOENT;
>  
>  	index = get_netdev_queue_index(queue);
> +
> +	/* If queue belongs to subordinate dev use its tc mapping */
> +	dev = netdev_get_tx_queue(dev, index)->sb_dev ? : dev;
> +
>  	tc = netdev_txq_to_tc(dev, index);
>  	if (tc < 0)
>  		return -EINVAL;
>  
> -	return sprintf(buf, "%u\n", tc);
> +	/* We can report the traffic class one of two ways:
> +	 * Subordinate device traffic classes are reported with the traffic
> +	 * class first, and then the subordinate class so for example TC0 on
> +	 * subordinate device 2 will be reported as "0-2". If the queue
> +	 * belongs to the root device it will be reported with just the
> +	 * traffic class, so just "0" for TC 0 for example.
> +	 */
> +	return dev->num_tc < 0 ? sprintf(buf, "%u%d\n", tc, dev->num_tc) :
> +				 sprintf(buf, "%u\n", tc);
>  }
>  
>  #ifdef CONFIG_XPS
> @@ -1225,7 +1237,14 @@ static ssize_t xps_cpus_show(struct netdev_queue *queue,
>  	index = get_netdev_queue_index(queue);
>  
>  	if (dev->num_tc) {
> +		/* Do not allow XPS on subordinate device directly */
>  		num_tc = dev->num_tc;
> +		if (num_tc < 0)
> +			return -EINVAL;
> +
> +		/* If queue belongs to subordinate dev use its map */
> +		dev = netdev_get_tx_queue(dev, index)->sb_dev ? : dev;
> +
>  		tc = netdev_txq_to_tc(dev, index);
>  		if (tc < 0)
>  			return -EINVAL;
> 


-- 
Florian

^ permalink raw reply

* Re: [jkirsher/next-queue PATCH v2 0/7] Add support for L2 Fwd Offload w/o ndo_select_queue
From: Stephen Hemminger @ 2018-06-12 17:50 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: intel-wired-lan, jeffrey.t.kirsher, netdev
In-Reply-To: <20180612151322.86792.97587.stgit@ahduyck-green-test.jf.intel.com>

On Tue, 12 Jun 2018 11:18:25 -0400
Alexander Duyck <alexander.h.duyck@intel.com> wrote:

> This patch series is meant to allow support for the L2 forward offload, aka
> MACVLAN offload without the need for using ndo_select_queue.
> 
> The existing solution currently requires that we use ndo_select_queue in
> the transmit path if we want to associate specific Tx queues with a given
> MACVLAN interface. In order to get away from this we need to repurpose the
> tc_to_txq array and XPS pointer for the MACVLAN interface and use those as
> a means of accessing the queues on the lower device. As a result we cannot
> offload a device that is configured as multiqueue, however it doesn't
> really make sense to configure a macvlan interfaced as being multiqueue
> anyway since it doesn't really have a qdisc of its own in the first place.
> 
> I am submitting this as an RFC for the netdev mailing list, and officially
> submitting it for testing to Jeff Kirsher's next-queue in order to validate
> the ixgbe specific bits.
> 
> The big changes in this set are:
>   Allow lower device to update tc_to_txq and XPS map of offloaded MACVLAN
>   Disable XPS for single queue devices
>   Replace accel_priv with sb_dev in ndo_select_queue
>   Add sb_dev parameter to fallback function for ndo_select_queue
>   Consolidated ndo_select_queue functions that appeared to be duplicates
> 
> v2: Implement generic "select_queue" functions instead of "fallback" functions.
>     Tweak last two patches to account for changes in dev_pick_tx_xxx functions.
> 
> ---
> 
> Alexander Duyck (7):
>       net-sysfs: Drop support for XPS and traffic_class on single queue device
>       net: Add support for subordinate device traffic classes
>       ixgbe: Add code to populate and use macvlan tc to Tx queue map
>       net: Add support for subordinate traffic classes to netdev_pick_tx
>       net: Add generic ndo_select_queue functions
>       net: allow ndo_select_queue to pass netdev
>       net: allow fallback function to pass netdev
> 
> 
>  drivers/infiniband/hw/hfi1/vnic_main.c            |    2 
>  drivers/infiniband/ulp/opa_vnic/opa_vnic_netdev.c |    4 -
>  drivers/net/bonding/bond_main.c                   |    3 
>  drivers/net/ethernet/amazon/ena/ena_netdev.c      |    5 -
>  drivers/net/ethernet/broadcom/bcmsysport.c        |    6 -
>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c   |    6 +
>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h   |    3 
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c   |    5 -
>  drivers/net/ethernet/hisilicon/hns/hns_enet.c     |    5 -
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     |   62 ++++++--
>  drivers/net/ethernet/lantiq_etop.c                |   10 -
>  drivers/net/ethernet/mellanox/mlx4/en_tx.c        |    7 +
>  drivers/net/ethernet/mellanox/mlx4/mlx4_en.h      |    3 
>  drivers/net/ethernet/mellanox/mlx5/core/en.h      |    3 
>  drivers/net/ethernet/mellanox/mlx5/core/en_tx.c   |    5 -
>  drivers/net/ethernet/renesas/ravb_main.c          |    3 
>  drivers/net/ethernet/sun/ldmvsw.c                 |    3 
>  drivers/net/ethernet/sun/sunvnet.c                |    3 
>  drivers/net/ethernet/ti/netcp_core.c              |    9 -
>  drivers/net/hyperv/netvsc_drv.c                   |    6 -
>  drivers/net/macvlan.c                             |   10 -
>  drivers/net/net_failover.c                        |    7 +
>  drivers/net/team/team.c                           |    3 
>  drivers/net/tun.c                                 |    3 
>  drivers/net/wireless/marvell/mwifiex/main.c       |    3 
>  drivers/net/xen-netback/interface.c               |    4 -
>  drivers/net/xen-netfront.c                        |    3 
>  drivers/staging/netlogic/xlr_net.c                |    9 -
>  drivers/staging/rtl8188eu/os_dep/os_intfs.c       |    3 
>  drivers/staging/rtl8723bs/os_dep/os_intfs.c       |    7 -
>  include/linux/netdevice.h                         |   34 ++++-
>  net/core/dev.c                                    |  156 ++++++++++++++++++---
>  net/core/net-sysfs.c                              |   36 ++++-
>  net/mac80211/iface.c                              |    4 -
>  net/packet/af_packet.c                            |    7 +
>  35 files changed, 312 insertions(+), 130 deletions(-)
> 
> --

This makes sense. I thought you were hoping to get rid of select queue in future?

^ permalink raw reply

* Re: [jkirsher/next-queue PATCH v2 0/7] Add support for L2 Fwd Offload w/o ndo_select_queue
From: Florian Fainelli @ 2018-06-12 17:56 UTC (permalink / raw)
  To: Alexander Duyck, intel-wired-lan, jeffrey.t.kirsher, netdev
In-Reply-To: <20180612151322.86792.97587.stgit@ahduyck-green-test.jf.intel.com>

On 06/12/2018 08:18 AM, Alexander Duyck wrote:
> This patch series is meant to allow support for the L2 forward offload, aka
> MACVLAN offload without the need for using ndo_select_queue.
> 
> The existing solution currently requires that we use ndo_select_queue in
> the transmit path if we want to associate specific Tx queues with a given
> MACVLAN interface. In order to get away from this we need to repurpose the
> tc_to_txq array and XPS pointer for the MACVLAN interface and use those as
> a means of accessing the queues on the lower device. As a result we cannot
> offload a device that is configured as multiqueue, however it doesn't
> really make sense to configure a macvlan interfaced as being multiqueue
> anyway since it doesn't really have a qdisc of its own in the first place.

Interesting, so at some point I had came up with the following for
mapping queues between the DSA slave network devices and the DSA master
network device (doing the actual transmission). The DSA master network
device driver is just a normal network device driver.

The set-up is as follows: 4 external Ethernet switch ports, each with 8
egress queues and the DSA master (bcmsysport.c), aka CPU Ethernet
controller has 32 output queues, so you can do a 1:1 mapping of those,
that's actually what we want. A subsequent hardware generation only
provides 16 output queues, so we can still do 2:1 mapping.

The implementation is done like this:

- DSA slave network devices are always created after the DSA master
network device so we can leverage that

- a specific notifier is running from the DSA core and tells the DSA
master about the switch position in the tree (position 0 = directly
attached), and the switch port number and a pointer to the slave network
device

- we establish the mapping between the queues within the bcmsysport
driver as a simple array

- when transmitting, DSA slave network devices set a specific queue/port
number within the 16-bits that skb->queue_mapping permits

- this gets re-used by bcmsysport.c to extract the correct queue number
during ndo_select_queue such that the appropriate queue number gets used
and congestion works end-to-end.

The reason why we do that is because there is some out of band HW that
monitors the queue depth of the switch port's egress queue and
back-pressure the Ethernet controller directly when trying to transmit
to a congested queue.

I had initially considered establishing the mapping using tc and some
custom "bind" argument of some kind, but ended-up doing things the way
they are which are more automatic though they leave less configuration
to an user. This has a number of caveats though:

- this is made generic within the context of DSA in that nothing is
switch driver or Ethernet MAC driver specific and the notifier
represents the contract between these two seemingly independent subsystems

- the queue indicated between DSA slave and master is unfortunately
switch driver/controller specific (BRCM_TAG_SET_PORT_QUEUE,
BRCM_TAG_GET_PORT, BRCM_TAG_GET_QUEUE)

What I like about your patchset is the mapping establishment, but as you
will read from my reply in patch 2, I think the (upper) 1:N (lower)
mapping might not work for my specific use case.

Anyhow, not intended to be blocking this, as it seems to be going in the
right direction anyway.

> 
> I am submitting this as an RFC for the netdev mailing list, and officially
> submitting it for testing to Jeff Kirsher's next-queue in order to validate
> the ixgbe specific bits.
> 
> The big changes in this set are:
>   Allow lower device to update tc_to_txq and XPS map of offloaded MACVLAN
>   Disable XPS for single queue devices
>   Replace accel_priv with sb_dev in ndo_select_queue
>   Add sb_dev parameter to fallback function for ndo_select_queue
>   Consolidated ndo_select_queue functions that appeared to be duplicates

Interesting, turns out I had a possibly similar use case with DSA with
the slave network devices need to select an outgoing queue number for

> 
> v2: Implement generic "select_queue" functions instead of "fallback" functions.
>     Tweak last two patches to account for changes in dev_pick_tx_xxx functions.
> 
> ---
> 
> Alexander Duyck (7):
>       net-sysfs: Drop support for XPS and traffic_class on single queue device
>       net: Add support for subordinate device traffic classes
>       ixgbe: Add code to populate and use macvlan tc to Tx queue map
>       net: Add support for subordinate traffic classes to netdev_pick_tx
>       net: Add generic ndo_select_queue functions
>       net: allow ndo_select_queue to pass netdev
>       net: allow fallback function to pass netdev
> 
> 
>  drivers/infiniband/hw/hfi1/vnic_main.c            |    2 
>  drivers/infiniband/ulp/opa_vnic/opa_vnic_netdev.c |    4 -
>  drivers/net/bonding/bond_main.c                   |    3 
>  drivers/net/ethernet/amazon/ena/ena_netdev.c      |    5 -
>  drivers/net/ethernet/broadcom/bcmsysport.c        |    6 -
>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c   |    6 +
>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h   |    3 
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c   |    5 -
>  drivers/net/ethernet/hisilicon/hns/hns_enet.c     |    5 -
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     |   62 ++++++--
>  drivers/net/ethernet/lantiq_etop.c                |   10 -
>  drivers/net/ethernet/mellanox/mlx4/en_tx.c        |    7 +
>  drivers/net/ethernet/mellanox/mlx4/mlx4_en.h      |    3 
>  drivers/net/ethernet/mellanox/mlx5/core/en.h      |    3 
>  drivers/net/ethernet/mellanox/mlx5/core/en_tx.c   |    5 -
>  drivers/net/ethernet/renesas/ravb_main.c          |    3 
>  drivers/net/ethernet/sun/ldmvsw.c                 |    3 
>  drivers/net/ethernet/sun/sunvnet.c                |    3 
>  drivers/net/ethernet/ti/netcp_core.c              |    9 -
>  drivers/net/hyperv/netvsc_drv.c                   |    6 -
>  drivers/net/macvlan.c                             |   10 -
>  drivers/net/net_failover.c                        |    7 +
>  drivers/net/team/team.c                           |    3 
>  drivers/net/tun.c                                 |    3 
>  drivers/net/wireless/marvell/mwifiex/main.c       |    3 
>  drivers/net/xen-netback/interface.c               |    4 -
>  drivers/net/xen-netfront.c                        |    3 
>  drivers/staging/netlogic/xlr_net.c                |    9 -
>  drivers/staging/rtl8188eu/os_dep/os_intfs.c       |    3 
>  drivers/staging/rtl8723bs/os_dep/os_intfs.c       |    7 -
>  include/linux/netdevice.h                         |   34 ++++-
>  net/core/dev.c                                    |  156 ++++++++++++++++++---
>  net/core/net-sysfs.c                              |   36 ++++-
>  net/mac80211/iface.c                              |    4 -
>  net/packet/af_packet.c                            |    7 +
>  35 files changed, 312 insertions(+), 130 deletions(-)
> 
> --
> 


-- 
Florian

^ permalink raw reply

* Re: [PATCH 1/2] Convert target drivers to use sbitmap
From: Matthew Wilcox @ 2018-06-12 18:08 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: jgross@suse.com, axboe@kernel.dk, linux-scsi@vger.kernel.org,
	kvm@vger.kernel.org, netdev@vger.kernel.org,
	linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	target-devel@vger.kernel.org, qla2xxx-upstream@qlogic.com,
	linux1394-devel@lists.sourceforge.net, kent.overstreet@gmail.com
In-Reply-To: <0c93c72a3a339f3479f82de04223315671e07863.camel@wdc.com>

On Tue, Jun 12, 2018 at 04:32:03PM +0000, Bart Van Assche wrote:
> On Tue, 2018-06-12 at 09:15 -0700, Matthew Wilcox wrote:
> > On Tue, Jun 12, 2018 at 03:22:42PM +0000, Bart Van Assche wrote:
> > > Please introduce functions in the target core for allocating and freeing a tag
> > > instead of spreading the knowledge of how to allocate and free tags over all
> > > target drivers.
> > 
> > I can't without doing an unreasonably large amount of work on drivers that
> > I have no way to test.  Some of the drivers have the se_cmd already; some
> > of them don't.  I'd be happy to introduce a common function for freeing
> > a tag.
> 
> Which target drivers are you referring to? If you are referring to the sbp driver:
> I think that driver is dead and can be removed from the kernel tree. I even don't
> know whether that driver ever has had any users other than the developer of that
> driver.

For example tcm_fc:

        tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
        if (tag < 0)
                goto busy;

        cmd = &((struct ft_cmd *)se_sess->sess_cmd_map)[tag];

or qla2xxx:

        tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
        if (tag < 0)
                return NULL;

        cmd = &((struct qla_tgt_cmd *)se_sess->sess_cmd_map)[tag];

The core doesn't know at what offset from the pointer to store the tag
& cpu.  Only the individual drivers know their cmd layout.

^ permalink raw reply

* Re: [PATCH] Revert "net: do not allow changing SO_REUSEADDR/SO_REUSEPORT on bound sockets"
From: David Miller @ 2018-06-12 18:10 UTC (permalink / raw)
  To: bart.vanassche; +Cc: netdev, maze, edumazet
In-Reply-To: <20180612170555.11733-1-bart.vanassche@wdc.com>

From: Bart Van Assche <bart.vanassche@wdc.com>
Date: Tue, 12 Jun 2018 10:05:55 -0700

> Revert the patch mentioned in the subject because it breaks at least
> the Avahi mDNS daemon. That patch namely causes the Ubuntu 18.04 Avahi
> daemon to fail to start:
> 
> Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: Successfully called chroot().
> Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: Successfully dropped remaining capabilities.
> Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: No service file found in /etc/avahi/services.
> Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: SO_REUSEADDR failed: Structure needs cleaning
> Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: SO_REUSEADDR failed: Structure needs cleaning
> Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: Failed to create server: No suitable network protocol available
> Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: avahi-daemon 0.7 exiting.
> Jun 12 09:49:24 ubuntu-vm systemd[1]: avahi-daemon.service: Main process exited, code=exited, status=255/n/a
> Jun 12 09:49:24 ubuntu-vm systemd[1]: avahi-daemon.service: Failed with result 'exit-code'.
> Jun 12 09:49:24 ubuntu-vm systemd[1]: Failed to start Avahi mDNS/DNS-SD Stack.
> 
> Fixes: f396922d862a ("net: do not allow changing SO_REUSEADDR/SO_REUSEPORT on bound sockets")
> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>

Applied, thanks.

I held off on submitting the reverted patch to -stable, and have now
thus removed it from my -stable queue.

^ permalink raw reply

* Re: [PATCH] optoe: driver to read/write SFP/QSFP EEPROMs
From: Andrew Lunn @ 2018-06-12 18:11 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Don Bollinger, Arnd Bergmann, Greg Kroah-Hartman, linux-kernel,
	brandon_chuang, wally_wang, roy_lee, rick_burchett, quentin.chang,
	jeffrey.townsend, scotte, roopa, David Ahern, luke.williams,
	Guohan Lu, Russell King, netdev@vger.kernel.org
In-Reply-To: <496e06b9-9f02-c4ae-4156-ab6221ba23fd@amd.com>

> There's an SFP driver under drivers/net/phy.  Can that driver be extended
> to provide this support?  Adding Russel King who developed sfp.c, as well
> at the netdev mailing list.

I agree, the current SFP code should be used.

My observations seem to be there are two different ways {Q}SFP are used:

1) The Linux kernel has full control, as assumed by the devlink/SFP
frame work. We parse the SFP data to find the capabilities of the SFP
and use it to program the MAC to use the correct mode. The MAC can be
a NIC, but it can also be a switch. DSA is gaining support for
PHYLINK, so SFP modules should just work with most switches which DSA
support.  And there is no reason a plain switchdev switch can not use
PHYLINK.

2) Firmware is in control of the PHY layer, but there is a wish to
expose some of the data which is available via i2c from the {Q}SFP to
linux.

It appears this optoe supports this second case. It does not appear to
support any in kernel API to actually make use of the SFP data in the
kernel.

We should not be duplicating code. We should share the SFP code for
both use cases above. There is also a Linux standard API for getting
access to this information. ethtool -m/--module-info. Anything which
is exporting {Q}SFP data needs to use this API.

   Andrew

^ permalink raw reply

* Re: [PATCH bpf v3] tools/bpftool: fix a bug in bpftool perf
From: Jakub Kicinski @ 2018-06-12 18:15 UTC (permalink / raw)
  To: Yonghong Song; +Cc: ast, daniel, netdev, kernel-team
In-Reply-To: <20180612053548.901931-1-yhs@fb.com>

On Mon, 11 Jun 2018 22:35:48 -0700, Yonghong Song wrote:
> Commit b04df400c302 ("tools/bpftool: add perf subcommand")
> introduced bpftool subcommand perf to query bpf program
> kuprobe and tracepoint attachments.
> 
> The perf subcommand will first test whether bpf subcommand
> BPF_TASK_FD_QUERY is supported in kernel or not. It does it
> by opening a file with argv[0] and feeds the file descriptor
> and current task pid to the kernel for querying.
> 
> Such an approach won't work if the argv[0] cannot be opened
> successfully in the current directory. This is especially
> true when bpftool is accessible through PATH env variable.
> The error below reflects the open failure for file argv[0]
> at home directory.
> 
>   [yhs@localhost ~]$ which bpftool
>   /usr/local/sbin/bpftool
>   [yhs@localhost ~]$ bpftool perf
>   Error: perf_query_support: No such file or directory
> 
> To fix the issue, let us open root directory ("/")
> which exists in every linux system. With the fix, the
> error message will correctly reflect the permission issue.
> 
>   [yhs@localhost ~]$ which bpftool
>   /usr/local/sbin/bpftool
>   [yhs@localhost ~]$ bpftool perf
>   Error: perf_query_support: Operation not permitted
>   HINT: non root or kernel doesn't support TASK_FD_QUERY
> 
> Fixes: b04df400c302 ("tools/bpftool: add perf subcommand")
> Reported-by: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: Yonghong Song <yhs@fb.com>

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>

FWIW :)

^ permalink raw reply

* Re: BUG: 4.14.11 unable to handle kernel NULL pointer dereference in xfrm_lookup
From: Tobias Hommel @ 2018-06-12 18:39 UTC (permalink / raw)
  To: Kristian Evensen; +Cc: Steffen Klassert, Markus Berner, Network Development
In-Reply-To: <CAKfDRXiq2c2ruvT8XoXGQntHYccAOp0zUZ3uH4iJM3cSAQkNVw@mail.gmail.com>

On Fri, Jun 08, 2018 at 10:41:37AM +0200, Kristian Evensen wrote:
> Hi,
> 
> On Wed, Jun 6, 2018 at 6:03 PM, Tobias Hommel <netdev-list@genoetigt.de> wrote:
> > Sorry no progress until now, I currently do not get time to have a deeper look
> > into that. We're back to 4.1.6 right now.
> 
> Thanks for letting me know. In the project I am currently involved in,
> we unfortunately don't have the option of reverting the kernel, so we
> are finding ways to live with the error. We have been looking into the
> error a bit more, and have made the following observations:
> 
> * First of all, as discussed earlier in the thread, the error is
> triggered by dst_orig being NULL. Our current work-around is just to
> return from xfrm_lookup if dst_orig is NULL and this seems to work
> fine, the error doesn't happen that often (in our use-cases at least).
> * The machine we use for testing (and where we first saw the error) is
> used as initiator.
The machine where I encountered the bug is a "roadwarrior gateway", so it only
serves as a responder.

> * When we compare the logs from Strongswan with the ones from the
> kernel, it seems that the error is typically triggered when a tunnels
> is teared down/about to come up. We need quite a lot of tunnels for
> the error to trigger, usually around 30+. I guess this might point to
> some race or some condition not being met when packets are
> sent/received.
> * We see the error much more frequently when hardware encryption is enabled.
> * Yesterday, we upgraded the kernel from 4.14.34 to 4.14.48, and the
> error happens much less frequently. I see that 4.14.48 includes
> several IPsec fixes (for example the previously mentioned ("xfrm: Fix
> a race in the xdst pcpu cache.")).
> 
> BR,
> Kristian

^ permalink raw reply

* [PATCH iproute2-next v2] ip-xfrm: Add support for OUTPUT_MARK
From: Subash Abhinov Kasiviswanathan @ 2018-06-12 18:48 UTC (permalink / raw)
  To: lorenzo, netdev, stephen, dsahern, steffen.klassert
  Cc: Subash Abhinov Kasiviswanathan

This patch adds support for OUTPUT_MARK in xfrm state to exercise the
functionality added by kernel commit 077fbac405bf
("net: xfrm: support setting an output mark.").

Sample output with output-mark -

src 192.168.1.1 dst 192.168.1.2
        proto esp spi 0x00004321 reqid 0 mode tunnel
        replay-window 0 flag af-unspec
        mark 0x10000/0x3ffff
        output-mark 0x20000
        auth-trunc xcbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b211 96
        enc cbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b233
        anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000

v1->v2: Moved the XFRMA_OUTPUT_MARK print after XFRMA_MARK in
xfrm_xfrma_print() as mentioned by Lorenzo

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
---
 ip/ipxfrm.c        | 6 ++++++
 ip/xfrm_state.c    | 9 +++++++++
 man/man8/ip-xfrm.8 | 2 ++
 3 files changed, 17 insertions(+)

diff --git a/ip/ipxfrm.c b/ip/ipxfrm.c
index 12c2f72..8b88c8f 100644
--- a/ip/ipxfrm.c
+++ b/ip/ipxfrm.c
@@ -681,6 +681,12 @@ void xfrm_xfrma_print(struct rtattr *tb[], __u16 family,
 		fprintf(fp, "%s", _SL_);
 	}
 
+	if (tb[XFRMA_OUTPUT_MARK]) {
+		__u32 output_mark = rta_getattr_u32(tb[XFRMA_OUTPUT_MARK]);
+
+		fprintf(fp, "\toutput-mark 0x%x %s", output_mark, _SL_);
+	}
+
 	if (tb[XFRMA_ALG_AUTH] && !tb[XFRMA_ALG_AUTH_TRUNC]) {
 		struct rtattr *rta = tb[XFRMA_ALG_AUTH];
 
diff --git a/ip/xfrm_state.c b/ip/xfrm_state.c
index 85d959c..d005802 100644
--- a/ip/xfrm_state.c
+++ b/ip/xfrm_state.c
@@ -61,6 +61,7 @@ static void usage(void)
 	fprintf(stderr, "        [ flag FLAG-LIST ] [ sel SELECTOR ] [ LIMIT-LIST ] [ encap ENCAP ]\n");
 	fprintf(stderr, "        [ coa ADDR[/PLEN] ] [ ctx CTX ] [ extra-flag EXTRA-FLAG-LIST ]\n");
 	fprintf(stderr, "        [ offload [dev DEV] dir DIR ]\n");
+	fprintf(stderr, "        [ output-mark OUTPUT-MARK]\n");
 	fprintf(stderr, "Usage: ip xfrm state allocspi ID [ mode MODE ] [ mark MARK [ mask MASK ] ]\n");
 	fprintf(stderr, "        [ reqid REQID ] [ seq SEQ ] [ min SPI max SPI ]\n");
 	fprintf(stderr, "Usage: ip xfrm state { delete | get } ID [ mark MARK [ mask MASK ] ]\n");
@@ -322,6 +323,7 @@ static int xfrm_state_modify(int cmd, unsigned int flags, int argc, char **argv)
 		struct xfrm_user_sec_ctx sctx;
 		char    str[CTX_BUF_SIZE];
 	} ctx = {};
+	__u32 output_mark = 0;
 
 	while (argc > 0) {
 		if (strcmp(*argv, "mode") == 0) {
@@ -437,6 +439,10 @@ static int xfrm_state_modify(int cmd, unsigned int flags, int argc, char **argv)
 				invarg("value after \"offload dir\" is invalid", *argv);
 				is_offload = false;
 			}
+		} else if (strcmp(*argv, "output-mark") == 0) {
+			NEXT_ARG();
+			if (get_u32(&output_mark, *argv, 0))
+				invarg("value after \"output-mark\" is invalid", *argv);
 		} else {
 			/* try to assume ALGO */
 			int type = xfrm_algotype_getbyname(*argv);
@@ -720,6 +726,9 @@ static int xfrm_state_modify(int cmd, unsigned int flags, int argc, char **argv)
 		}
 	}
 
+	if (output_mark != 0)
+		addattr32(&req.n, sizeof(req.buf), XFRMA_OUTPUT_MARK, output_mark);
+
 	if (rtnl_open_byproto(&rth, 0, NETLINK_XFRM) < 0)
 		exit(1);
 
diff --git a/man/man8/ip-xfrm.8 b/man/man8/ip-xfrm.8
index 988cc6a..e001596 100644
--- a/man/man8/ip-xfrm.8
+++ b/man/man8/ip-xfrm.8
@@ -59,6 +59,8 @@ ip-xfrm \- transform configuration
 .IR CTX " ]"
 .RB "[ " extra-flag
 .IR EXTRA-FLAG-LIST " ]"
+.RB "[ " output-mark
+.IR OUTPUT-MARK " ]"
 
 .ti -8
 .B "ip xfrm state allocspi"
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH iproute2-next] ip-xfrm: Add support for OUTPUT_MARK
From: Subash Abhinov Kasiviswanathan @ 2018-06-12 18:51 UTC (permalink / raw)
  To: Lorenzo Colitti; +Cc: netdev, Stephen Hemminger, David Ahern, Steffen Klassert
In-Reply-To: <CAKD1Yr0Z8ZgyE=b2MXtGOaJSRm0Y8spnU2pDxuWLd5FFgfx=eQ@mail.gmail.com>

> Have you considered putting this earlier up in the output, where the
> mark is printed as well?
> 
>> +       if (tb[XFRMA_OUTPUT_MARK]) {
>> +               __u32 output_mark = 
>> rta_getattr_u32(tb[XFRMA_OUTPUT_MARK]);
>> +
>> +               fprintf(fp, "\toutput-mark 0x%x %s", output_mark, 
>> _SL_);
>> +       }
>>  }
> 
> If you wanted to implement the suggestion above, I think you could do
> that by moving this code into xfrm_xfrma_print.
> 

Hi Lorenzo

I have updated it now in v2.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply

* [PATCH 0/3] Use sbitmap instead of percpu_ida
From: Matthew Wilcox @ 2018-06-12 19:05 UTC (permalink / raw)
  To: linux-kernel, linux-scsi, target-devel, linux1394-devel,
	linux-usb, kvm, virtualization, netdev, Juergen Gross,
	qla2xxx-upstream, Kent Overstreet, Jens Axboe
  Cc: Matthew Wilcox

Removing the percpu_ida code nets over 400 lines of removal.  It's not
as spectacular as deleting an entire architecture, but it's still a
worthy reduction in lines of code.

Untested due to lack of hardware and not understanding how to set up a
target platform.

Changes from v1:
 - Fixed bugs pointed out by Jens in iscsit_wait_for_tag()
 - Abstracted out tag freeing as requested by Bart
 - Made iscsit_wait_for_tag static as pointed out by 0day

Matthew Wilcox (3):
  target: Abstract tag freeing
  Convert target drivers to use sbitmap
  Remove percpu_ida

 drivers/scsi/qla2xxx/qla_target.c        |  14 +-
 drivers/target/iscsi/iscsi_target_util.c |  35 ++-
 drivers/target/sbp/sbp_target.c          |   7 +-
 drivers/target/target_core_transport.c   |   5 +-
 drivers/target/tcm_fc/tfc_cmd.c          |  10 +-
 drivers/usb/gadget/function/f_tcm.c      |   7 +-
 drivers/vhost/scsi.c                     |   8 +-
 drivers/xen/xen-scsiback.c               |   9 +-
 include/linux/percpu_ida.h               |  83 -----
 include/target/iscsi/iscsi_target_core.h |   1 +
 include/target/target_core_base.h        |  10 +-
 lib/Makefile                             |   2 +-
 lib/percpu_ida.c                         | 370 -----------------------
 13 files changed, 73 insertions(+), 488 deletions(-)
 delete mode 100644 include/linux/percpu_ida.h
 delete mode 100644 lib/percpu_ida.c

-- 
2.17.1

^ permalink raw reply

* [PATCH 1/3] target: Abstract tag freeing
From: Matthew Wilcox @ 2018-06-12 19:05 UTC (permalink / raw)
  To: linux-kernel, linux-scsi, target-devel, linux1394-devel,
	linux-usb, kvm, virtualization, netdev, Juergen Gross,
	qla2xxx-upstream, Kent Overstreet, Jens Axboe
  Cc: Matthew Wilcox
In-Reply-To: <20180612190545.10781-1-willy@infradead.org>

Introduce target_free_tag() and convert all drivers to use it.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
---
 drivers/scsi/qla2xxx/qla_target.c        | 4 ++--
 drivers/target/iscsi/iscsi_target_util.c | 2 +-
 drivers/target/sbp/sbp_target.c          | 2 +-
 drivers/target/tcm_fc/tfc_cmd.c          | 4 ++--
 drivers/usb/gadget/function/f_tcm.c      | 2 +-
 drivers/vhost/scsi.c                     | 2 +-
 drivers/xen/xen-scsiback.c               | 4 +---
 include/target/target_core_base.h        | 5 +++++
 8 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c
index b85c833099ff..05290966e630 100644
--- a/drivers/scsi/qla2xxx/qla_target.c
+++ b/drivers/scsi/qla2xxx/qla_target.c
@@ -3783,7 +3783,7 @@ void qlt_free_cmd(struct qla_tgt_cmd *cmd)
 		return;
 	}
 	cmd->jiffies_at_free = get_jiffies_64();
-	percpu_ida_free(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag);
+	target_free_tag(sess->se_sess, &cmd->se_cmd);
 }
 EXPORT_SYMBOL(qlt_free_cmd);
 
@@ -4146,7 +4146,7 @@ static void __qlt_do_work(struct qla_tgt_cmd *cmd)
 	qlt_send_term_exchange(qpair, NULL, &cmd->atio, 1, 0);
 
 	qlt_decr_num_pend_cmds(vha);
-	percpu_ida_free(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag);
+	target_free_tag(sess->se_sess, &cmd->se_cmd);
 	spin_unlock_irqrestore(qpair->qp_lock_ptr, flags);
 
 	spin_lock_irqsave(&ha->tgt.sess_lock, flags);
diff --git a/drivers/target/iscsi/iscsi_target_util.c b/drivers/target/iscsi/iscsi_target_util.c
index 4435bf374d2d..7e98697cfb8e 100644
--- a/drivers/target/iscsi/iscsi_target_util.c
+++ b/drivers/target/iscsi/iscsi_target_util.c
@@ -711,7 +711,7 @@ void iscsit_release_cmd(struct iscsi_cmd *cmd)
 	kfree(cmd->iov_data);
 	kfree(cmd->text_in_ptr);
 
-	percpu_ida_free(&sess->se_sess->sess_tag_pool, se_cmd->map_tag);
+	target_free_tag(sess->se_sess, se_cmd);
 }
 EXPORT_SYMBOL(iscsit_release_cmd);
 
diff --git a/drivers/target/sbp/sbp_target.c b/drivers/target/sbp/sbp_target.c
index fb1003921d85..679ae29d25ab 100644
--- a/drivers/target/sbp/sbp_target.c
+++ b/drivers/target/sbp/sbp_target.c
@@ -1460,7 +1460,7 @@ static void sbp_free_request(struct sbp_target_request *req)
 	kfree(req->pg_tbl);
 	kfree(req->cmd_buf);
 
-	percpu_ida_free(&se_sess->sess_tag_pool, se_cmd->map_tag);
+	target_free_tag(se_sess, se_cmd);
 }
 
 static void sbp_mgt_agent_process(struct work_struct *work)
diff --git a/drivers/target/tcm_fc/tfc_cmd.c b/drivers/target/tcm_fc/tfc_cmd.c
index ec372860106f..13e4efbe1ce7 100644
--- a/drivers/target/tcm_fc/tfc_cmd.c
+++ b/drivers/target/tcm_fc/tfc_cmd.c
@@ -92,7 +92,7 @@ static void ft_free_cmd(struct ft_cmd *cmd)
 	if (fr_seq(fp))
 		fc_seq_release(fr_seq(fp));
 	fc_frame_free(fp);
-	percpu_ida_free(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag);
+	target_free_tag(sess->se_sess, &cmd->se_cmd);
 	ft_sess_put(sess);	/* undo get from lookup at recv */
 }
 
@@ -461,7 +461,7 @@ static void ft_recv_cmd(struct ft_sess *sess, struct fc_frame *fp)
 	cmd->sess = sess;
 	cmd->seq = fc_seq_assign(lport, fp);
 	if (!cmd->seq) {
-		percpu_ida_free(&se_sess->sess_tag_pool, tag);
+		target_free_tag(se_sess, &cmd->se_cmd);
 		goto busy;
 	}
 	cmd->req_frame = fp;		/* hold frame during cmd */
diff --git a/drivers/usb/gadget/function/f_tcm.c b/drivers/usb/gadget/function/f_tcm.c
index d78dbb73bde8..9f670d9224b9 100644
--- a/drivers/usb/gadget/function/f_tcm.c
+++ b/drivers/usb/gadget/function/f_tcm.c
@@ -1288,7 +1288,7 @@ static void usbg_release_cmd(struct se_cmd *se_cmd)
 	struct se_session *se_sess = se_cmd->se_sess;
 
 	kfree(cmd->data_buf);
-	percpu_ida_free(&se_sess->sess_tag_pool, se_cmd->map_tag);
+	target_free_tag(se_sess, se_cmd);
 }
 
 static u32 usbg_sess_get_index(struct se_session *se_sess)
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 7ad57094d736..70d35e696533 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -324,7 +324,7 @@ static void vhost_scsi_release_cmd(struct se_cmd *se_cmd)
 	}
 
 	vhost_scsi_put_inflight(tv_cmd->inflight);
-	percpu_ida_free(&se_sess->sess_tag_pool, se_cmd->map_tag);
+	target_free_tag(se_sess, se_cmd);
 }
 
 static u32 vhost_scsi_sess_get_index(struct se_session *se_sess)
diff --git a/drivers/xen/xen-scsiback.c b/drivers/xen/xen-scsiback.c
index 7bc88fd43cfc..ec6635258ed8 100644
--- a/drivers/xen/xen-scsiback.c
+++ b/drivers/xen/xen-scsiback.c
@@ -1377,9 +1377,7 @@ static int scsiback_check_stop_free(struct se_cmd *se_cmd)
 
 static void scsiback_release_cmd(struct se_cmd *se_cmd)
 {
-	struct se_session *se_sess = se_cmd->se_sess;
-
-	percpu_ida_free(&se_sess->sess_tag_pool, se_cmd->map_tag);
+	target_free_tag(se_cmd->se_sess, se_cmd);
 }
 
 static u32 scsiback_sess_get_index(struct se_session *se_sess)
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index 922a39f45abc..260c2f3e9460 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -934,4 +934,9 @@ static inline void atomic_dec_mb(atomic_t *v)
 	smp_mb__after_atomic();
 }
 
+static inline void target_free_tag(struct se_session *sess, struct se_cmd *cmd)
+{
+	percpu_ida_free(&sess->sess_tag_pool, cmd->map_tag);
+}
+
 #endif /* TARGET_CORE_BASE_H */
-- 
2.17.1


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

^ permalink raw reply related

* [PATCH 2/3] Convert target drivers to use sbitmap
From: Matthew Wilcox @ 2018-06-12 19:05 UTC (permalink / raw)
  To: linux-kernel, linux-scsi, target-devel, linux1394-devel,
	linux-usb, kvm, virtualization, netdev, Juergen Gross,
	qla2xxx-upstream, Kent Overstreet, Jens Axboe
  Cc: Matthew Wilcox
In-Reply-To: <20180612190545.10781-1-willy@infradead.org>

The sbitmap and the percpu_ida perform essentially the same task,
allocating tags for commands.  The sbitmap outperforms the percpu_ida
as documented here: https://lkml.org/lkml/2014/4/22/553

The sbitmap interface is a little harder to use, but being able to
remove the percpu_ida code and getting better performance justifies the
additional complexity.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Felipe Balbi <felipe.balbi@linux.intel.com>	# f_tcm
---
 drivers/scsi/qla2xxx/qla_target.c        | 10 ++++---
 drivers/target/iscsi/iscsi_target_util.c | 33 +++++++++++++++++++++---
 drivers/target/sbp/sbp_target.c          |  5 ++--
 drivers/target/target_core_transport.c   |  5 ++--
 drivers/target/tcm_fc/tfc_cmd.c          |  6 ++---
 drivers/usb/gadget/function/f_tcm.c      |  5 ++--
 drivers/vhost/scsi.c                     |  6 ++---
 drivers/xen/xen-scsiback.c               |  5 ++--
 include/target/iscsi/iscsi_target_core.h |  1 +
 include/target/target_core_base.h        |  7 ++---
 10 files changed, 59 insertions(+), 24 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c
index 05290966e630..a1725a054749 100644
--- a/drivers/scsi/qla2xxx/qla_target.c
+++ b/drivers/scsi/qla2xxx/qla_target.c
@@ -4277,9 +4277,9 @@ static struct qla_tgt_cmd *qlt_get_tag(scsi_qla_host_t *vha,
 {
 	struct se_session *se_sess = sess->se_sess;
 	struct qla_tgt_cmd *cmd;
-	int tag;
+	int tag, cpu;
 
-	tag = percpu_ida_alloc(&se_sess->sess_tag_pool, TASK_RUNNING);
+	tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
 	if (tag < 0)
 		return NULL;
 
@@ -4292,6 +4292,7 @@ static struct qla_tgt_cmd *qlt_get_tag(scsi_qla_host_t *vha,
 	qlt_incr_num_pend_cmds(vha);
 	cmd->vha = vha;
 	cmd->se_cmd.map_tag = tag;
+	cmd->se_cmd.map_cpu = cpu;
 	cmd->sess = sess;
 	cmd->loop_id = sess->loop_id;
 	cmd->conf_compl_supported = sess->conf_compl_supported;
@@ -5294,7 +5295,7 @@ qlt_alloc_qfull_cmd(struct scsi_qla_host *vha,
 	struct fc_port *sess;
 	struct se_session *se_sess;
 	struct qla_tgt_cmd *cmd;
-	int tag;
+	int tag, cpu;
 	unsigned long flags;
 
 	if (unlikely(tgt->tgt_stop)) {
@@ -5326,7 +5327,7 @@ qlt_alloc_qfull_cmd(struct scsi_qla_host *vha,
 
 	se_sess = sess->se_sess;
 
-	tag = percpu_ida_alloc(&se_sess->sess_tag_pool, TASK_RUNNING);
+	tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
 	if (tag < 0)
 		return;
 
@@ -5357,6 +5358,7 @@ qlt_alloc_qfull_cmd(struct scsi_qla_host *vha,
 	cmd->reset_count = ha->base_qpair->chip_reset;
 	cmd->q_full = 1;
 	cmd->qpair = ha->base_qpair;
+	cmd->se_cmd.map_cpu = cpu;
 
 	if (qfull) {
 		cmd->q_full = 1;
diff --git a/drivers/target/iscsi/iscsi_target_util.c b/drivers/target/iscsi/iscsi_target_util.c
index 7e98697cfb8e..8cfcf9033507 100644
--- a/drivers/target/iscsi/iscsi_target_util.c
+++ b/drivers/target/iscsi/iscsi_target_util.c
@@ -17,7 +17,7 @@
  ******************************************************************************/
 
 #include <linux/list.h>
-#include <linux/percpu_ida.h>
+#include <linux/sched/signal.h>
 #include <net/ipv6.h>         /* ipv6_addr_equal() */
 #include <scsi/scsi_tcq.h>
 #include <scsi/iscsi_proto.h>
@@ -147,6 +147,30 @@ void iscsit_free_r2ts_from_list(struct iscsi_cmd *cmd)
 	spin_unlock_bh(&cmd->r2t_lock);
 }
 
+static int iscsit_wait_for_tag(struct se_session *se_sess, int state, int *cpup)
+{
+	int tag = -1;
+	DEFINE_WAIT(wait);
+	struct sbq_wait_state *ws;
+
+	if (state == TASK_RUNNING)
+		return tag;
+
+	ws = &se_sess->sess_tag_pool.ws[0];
+	for (;;) {
+		prepare_to_wait_exclusive(&ws->wait, &wait, state);
+		if (signal_pending_state(state, current))
+			break;
+		tag = sbitmap_queue_get(&se_sess->sess_tag_pool, cpup);
+		if (tag >= 0)
+			break;
+		schedule();
+	}
+
+	finish_wait(&ws->wait, &wait);
+	return tag;
+}
+
 /*
  * May be called from software interrupt (timer) context for allocating
  * iSCSI NopINs.
@@ -155,9 +179,11 @@ struct iscsi_cmd *iscsit_allocate_cmd(struct iscsi_conn *conn, int state)
 {
 	struct iscsi_cmd *cmd;
 	struct se_session *se_sess = conn->sess->se_sess;
-	int size, tag;
+	int size, tag, cpu;
 
-	tag = percpu_ida_alloc(&se_sess->sess_tag_pool, state);
+	tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
+	if (tag < 0)
+		tag = iscsit_wait_for_tag(se_sess, state, &cpu);
 	if (tag < 0)
 		return NULL;
 
@@ -166,6 +192,7 @@ struct iscsi_cmd *iscsit_allocate_cmd(struct iscsi_conn *conn, int state)
 	memset(cmd, 0, size);
 
 	cmd->se_cmd.map_tag = tag;
+	cmd->se_cmd.map_cpu = cpu;
 	cmd->conn = conn;
 	cmd->data_direction = DMA_NONE;
 	INIT_LIST_HEAD(&cmd->i_conn_node);
diff --git a/drivers/target/sbp/sbp_target.c b/drivers/target/sbp/sbp_target.c
index 679ae29d25ab..42b21f2ac8b0 100644
--- a/drivers/target/sbp/sbp_target.c
+++ b/drivers/target/sbp/sbp_target.c
@@ -926,15 +926,16 @@ static struct sbp_target_request *sbp_mgt_get_req(struct sbp_session *sess,
 {
 	struct se_session *se_sess = sess->se_sess;
 	struct sbp_target_request *req;
-	int tag;
+	int tag, cpu;
 
-	tag = percpu_ida_alloc(&se_sess->sess_tag_pool, TASK_RUNNING);
+	tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
 	if (tag < 0)
 		return ERR_PTR(-ENOMEM);
 
 	req = &((struct sbp_target_request *)se_sess->sess_cmd_map)[tag];
 	memset(req, 0, sizeof(*req));
 	req->se_cmd.map_tag = tag;
+	req->se_cmd.map_cpu = cpu;
 	req->se_cmd.tag = next_orb;
 
 	return req;
diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index f0e8f0f4ccb4..18c53c5cdd3d 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -260,7 +260,8 @@ int transport_alloc_session_tags(struct se_session *se_sess,
 		}
 	}
 
-	rc = percpu_ida_init(&se_sess->sess_tag_pool, tag_num);
+	rc = sbitmap_queue_init_node(&se_sess->sess_tag_pool, tag_num, -1,
+			false, GFP_KERNEL, NUMA_NO_NODE);
 	if (rc < 0) {
 		pr_err("Unable to init se_sess->sess_tag_pool,"
 			" tag_num: %u\n", tag_num);
@@ -547,7 +548,7 @@ void transport_free_session(struct se_session *se_sess)
 		target_put_nacl(se_nacl);
 	}
 	if (se_sess->sess_cmd_map) {
-		percpu_ida_destroy(&se_sess->sess_tag_pool);
+		sbitmap_queue_free(&se_sess->sess_tag_pool);
 		kvfree(se_sess->sess_cmd_map);
 	}
 	kmem_cache_free(se_sess_cache, se_sess);
diff --git a/drivers/target/tcm_fc/tfc_cmd.c b/drivers/target/tcm_fc/tfc_cmd.c
index 13e4efbe1ce7..a183d4da7db2 100644
--- a/drivers/target/tcm_fc/tfc_cmd.c
+++ b/drivers/target/tcm_fc/tfc_cmd.c
@@ -28,7 +28,6 @@
 #include <linux/configfs.h>
 #include <linux/ctype.h>
 #include <linux/hash.h>
-#include <linux/percpu_ida.h>
 #include <asm/unaligned.h>
 #include <scsi/scsi_tcq.h>
 #include <scsi/libfc.h>
@@ -448,9 +447,9 @@ static void ft_recv_cmd(struct ft_sess *sess, struct fc_frame *fp)
 	struct ft_cmd *cmd;
 	struct fc_lport *lport = sess->tport->lport;
 	struct se_session *se_sess = sess->se_sess;
-	int tag;
+	int tag, cpu;
 
-	tag = percpu_ida_alloc(&se_sess->sess_tag_pool, TASK_RUNNING);
+	tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
 	if (tag < 0)
 		goto busy;
 
@@ -458,6 +457,7 @@ static void ft_recv_cmd(struct ft_sess *sess, struct fc_frame *fp)
 	memset(cmd, 0, sizeof(struct ft_cmd));
 
 	cmd->se_cmd.map_tag = tag;
+	cmd->se_cmd.map_cpu = cpu;
 	cmd->sess = sess;
 	cmd->seq = fc_seq_assign(lport, fp);
 	if (!cmd->seq) {
diff --git a/drivers/usb/gadget/function/f_tcm.c b/drivers/usb/gadget/function/f_tcm.c
index 9f670d9224b9..5003e857dce7 100644
--- a/drivers/usb/gadget/function/f_tcm.c
+++ b/drivers/usb/gadget/function/f_tcm.c
@@ -1071,15 +1071,16 @@ static struct usbg_cmd *usbg_get_cmd(struct f_uas *fu,
 {
 	struct se_session *se_sess = tv_nexus->tvn_se_sess;
 	struct usbg_cmd *cmd;
-	int tag;
+	int tag, cpu;
 
-	tag = percpu_ida_alloc(&se_sess->sess_tag_pool, TASK_RUNNING);
+	tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
 	if (tag < 0)
 		return ERR_PTR(-ENOMEM);
 
 	cmd = &((struct usbg_cmd *)se_sess->sess_cmd_map)[tag];
 	memset(cmd, 0, sizeof(*cmd));
 	cmd->se_cmd.map_tag = tag;
+	cmd->se_cmd.map_cpu = cpu;
 	cmd->se_cmd.tag = cmd->tag = scsi_tag;
 	cmd->fu = fu;
 
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 70d35e696533..c9c5d6b291cc 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -46,7 +46,6 @@
 #include <linux/virtio_scsi.h>
 #include <linux/llist.h>
 #include <linux/bitmap.h>
-#include <linux/percpu_ida.h>
 
 #include "vhost.h"
 
@@ -567,7 +566,7 @@ vhost_scsi_get_tag(struct vhost_virtqueue *vq, struct vhost_scsi_tpg *tpg,
 	struct se_session *se_sess;
 	struct scatterlist *sg, *prot_sg;
 	struct page **pages;
-	int tag;
+	int tag, cpu;
 
 	tv_nexus = tpg->tpg_nexus;
 	if (!tv_nexus) {
@@ -576,7 +575,7 @@ vhost_scsi_get_tag(struct vhost_virtqueue *vq, struct vhost_scsi_tpg *tpg,
 	}
 	se_sess = tv_nexus->tvn_se_sess;
 
-	tag = percpu_ida_alloc(&se_sess->sess_tag_pool, TASK_RUNNING);
+	tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
 	if (tag < 0) {
 		pr_err("Unable to obtain tag for vhost_scsi_cmd\n");
 		return ERR_PTR(-ENOMEM);
@@ -591,6 +590,7 @@ vhost_scsi_get_tag(struct vhost_virtqueue *vq, struct vhost_scsi_tpg *tpg,
 	cmd->tvc_prot_sgl = prot_sg;
 	cmd->tvc_upages = pages;
 	cmd->tvc_se_cmd.map_tag = tag;
+	cmd->tvc_se_cmd.map_cpu = cpu;
 	cmd->tvc_tag = scsi_tag;
 	cmd->tvc_lun = lun;
 	cmd->tvc_task_attr = task_attr;
diff --git a/drivers/xen/xen-scsiback.c b/drivers/xen/xen-scsiback.c
index ec6635258ed8..764dd9aa0131 100644
--- a/drivers/xen/xen-scsiback.c
+++ b/drivers/xen/xen-scsiback.c
@@ -654,9 +654,9 @@ static struct vscsibk_pend *scsiback_get_pend_req(struct vscsiif_back_ring *ring
 	struct scsiback_nexus *nexus = tpg->tpg_nexus;
 	struct se_session *se_sess = nexus->tvn_se_sess;
 	struct vscsibk_pend *req;
-	int tag, i;
+	int tag, cpu, i;
 
-	tag = percpu_ida_alloc(&se_sess->sess_tag_pool, TASK_RUNNING);
+	tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
 	if (tag < 0) {
 		pr_err("Unable to obtain tag for vscsiif_request\n");
 		return ERR_PTR(-ENOMEM);
@@ -665,6 +665,7 @@ static struct vscsibk_pend *scsiback_get_pend_req(struct vscsiif_back_ring *ring
 	req = &((struct vscsibk_pend *)se_sess->sess_cmd_map)[tag];
 	memset(req, 0, sizeof(*req));
 	req->se_cmd.map_tag = tag;
+	req->se_cmd.map_cpu = cpu;
 
 	for (i = 0; i < VSCSI_MAX_GRANTS; i++)
 		req->grant_handles[i] = SCSIBACK_INVALID_HANDLE;
diff --git a/include/target/iscsi/iscsi_target_core.h b/include/target/iscsi/iscsi_target_core.h
index cf5f3fff1f1a..f2e6abea8490 100644
--- a/include/target/iscsi/iscsi_target_core.h
+++ b/include/target/iscsi/iscsi_target_core.h
@@ -4,6 +4,7 @@
 
 #include <linux/dma-direction.h>     /* enum dma_data_direction */
 #include <linux/list.h>              /* struct list_head */
+#include <linux/sched.h>
 #include <linux/socket.h>            /* struct sockaddr_storage */
 #include <linux/types.h>             /* u8 */
 #include <scsi/iscsi_proto.h>        /* itt_t */
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index 260c2f3e9460..448f291125c2 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -4,7 +4,7 @@
 
 #include <linux/configfs.h>      /* struct config_group */
 #include <linux/dma-direction.h> /* enum dma_data_direction */
-#include <linux/percpu_ida.h>    /* struct percpu_ida */
+#include <linux/sbitmap.h>
 #include <linux/percpu-refcount.h>
 #include <linux/semaphore.h>     /* struct semaphore */
 #include <linux/completion.h>
@@ -455,6 +455,7 @@ struct se_cmd {
 	int			sam_task_attr;
 	/* Used for se_sess->sess_tag_pool */
 	unsigned int		map_tag;
+	int			map_cpu;
 	/* Transport protocol dependent state, see transport_state_table */
 	enum transport_state_table t_state;
 	/* See se_cmd_flags_table */
@@ -608,7 +609,7 @@ struct se_session {
 	struct list_head	sess_wait_list;
 	spinlock_t		sess_cmd_lock;
 	void			*sess_cmd_map;
-	struct percpu_ida	sess_tag_pool;
+	struct sbitmap_queue	sess_tag_pool;
 };
 
 struct se_device;
@@ -936,7 +937,7 @@ static inline void atomic_dec_mb(atomic_t *v)
 
 static inline void target_free_tag(struct se_session *sess, struct se_cmd *cmd)
 {
-	percpu_ida_free(&sess->sess_tag_pool, cmd->map_tag);
+	sbitmap_queue_clear(&sess->sess_tag_pool, cmd->map_tag, cmd->map_cpu);
 }
 
 #endif /* TARGET_CORE_BASE_H */
-- 
2.17.1


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

^ permalink raw reply related

* [PATCH 3/3] Remove percpu_ida
From: Matthew Wilcox @ 2018-06-12 19:05 UTC (permalink / raw)
  To: linux-kernel, linux-scsi, target-devel, linux1394-devel,
	linux-usb, kvm, virtualization, netdev, Juergen Gross,
	qla2xxx-upstream, Kent Overstreet, Jens Axboe
  Cc: Matthew Wilcox
In-Reply-To: <20180612190545.10781-1-willy@infradead.org>

With its one user gone, remove the library code.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
---
 include/linux/percpu_ida.h |  83 ---------
 lib/Makefile               |   2 +-
 lib/percpu_ida.c           | 370 -------------------------------------
 3 files changed, 1 insertion(+), 454 deletions(-)
 delete mode 100644 include/linux/percpu_ida.h
 delete mode 100644 lib/percpu_ida.c

diff --git a/include/linux/percpu_ida.h b/include/linux/percpu_ida.h
deleted file mode 100644
index 07d78e4653bc..000000000000
--- a/include/linux/percpu_ida.h
+++ /dev/null
@@ -1,83 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef __PERCPU_IDA_H__
-#define __PERCPU_IDA_H__
-
-#include <linux/types.h>
-#include <linux/bitops.h>
-#include <linux/init.h>
-#include <linux/sched.h>
-#include <linux/spinlock_types.h>
-#include <linux/wait.h>
-#include <linux/cpumask.h>
-
-struct percpu_ida_cpu;
-
-struct percpu_ida {
-	/*
-	 * number of tags available to be allocated, as passed to
-	 * percpu_ida_init()
-	 */
-	unsigned			nr_tags;
-	unsigned			percpu_max_size;
-	unsigned			percpu_batch_size;
-
-	struct percpu_ida_cpu __percpu	*tag_cpu;
-
-	/*
-	 * Bitmap of cpus that (may) have tags on their percpu freelists:
-	 * steal_tags() uses this to decide when to steal tags, and which cpus
-	 * to try stealing from.
-	 *
-	 * It's ok for a freelist to be empty when its bit is set - steal_tags()
-	 * will just keep looking - but the bitmap _must_ be set whenever a
-	 * percpu freelist does have tags.
-	 */
-	cpumask_t			cpus_have_tags;
-
-	struct {
-		spinlock_t		lock;
-		/*
-		 * When we go to steal tags from another cpu (see steal_tags()),
-		 * we want to pick a cpu at random. Cycling through them every
-		 * time we steal is a bit easier and more or less equivalent:
-		 */
-		unsigned		cpu_last_stolen;
-
-		/* For sleeping on allocation failure */
-		wait_queue_head_t	wait;
-
-		/*
-		 * Global freelist - it's a stack where nr_free points to the
-		 * top
-		 */
-		unsigned		nr_free;
-		unsigned		*freelist;
-	} ____cacheline_aligned_in_smp;
-};
-
-/*
- * Number of tags we move between the percpu freelist and the global freelist at
- * a time
- */
-#define IDA_DEFAULT_PCPU_BATCH_MOVE	32U
-/* Max size of percpu freelist, */
-#define IDA_DEFAULT_PCPU_SIZE	((IDA_DEFAULT_PCPU_BATCH_MOVE * 3) / 2)
-
-int percpu_ida_alloc(struct percpu_ida *pool, int state);
-void percpu_ida_free(struct percpu_ida *pool, unsigned tag);
-
-void percpu_ida_destroy(struct percpu_ida *pool);
-int __percpu_ida_init(struct percpu_ida *pool, unsigned long nr_tags,
-	unsigned long max_size, unsigned long batch_size);
-static inline int percpu_ida_init(struct percpu_ida *pool, unsigned long nr_tags)
-{
-	return __percpu_ida_init(pool, nr_tags, IDA_DEFAULT_PCPU_SIZE,
-		IDA_DEFAULT_PCPU_BATCH_MOVE);
-}
-
-typedef int (*percpu_ida_cb)(unsigned, void *);
-int percpu_ida_for_each_free(struct percpu_ida *pool, percpu_ida_cb fn,
-	void *data);
-
-unsigned percpu_ida_free_tags(struct percpu_ida *pool, int cpu);
-#endif /* __PERCPU_IDA_H__ */
diff --git a/lib/Makefile b/lib/Makefile
index 84c6dcb31fbb..f4722a7fa62c 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -40,7 +40,7 @@ obj-y += bcd.o div64.o sort.o parser.o debug_locks.o random32.o \
 	 bust_spinlocks.o kasprintf.o bitmap.o scatterlist.o \
 	 gcd.o lcm.o list_sort.o uuid.o flex_array.o iov_iter.o clz_ctz.o \
 	 bsearch.o find_bit.o llist.o memweight.o kfifo.o \
-	 percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o \
+	 percpu-refcount.o rhashtable.o reciprocal_div.o \
 	 once.o refcount.o usercopy.o errseq.o bucket_locks.o
 obj-$(CONFIG_STRING_SELFTEST) += test_string.o
 obj-y += string_helpers.o
diff --git a/lib/percpu_ida.c b/lib/percpu_ida.c
deleted file mode 100644
index 9bbd9c5d375a..000000000000
--- a/lib/percpu_ida.c
+++ /dev/null
@@ -1,370 +0,0 @@
-/*
- * Percpu IDA library
- *
- * Copyright (C) 2013 Datera, Inc. Kent Overstreet
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License as
- * published by the Free Software Foundation; either version 2, or (at
- * your option) any later version.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * General Public License for more details.
- */
-
-#include <linux/mm.h>
-#include <linux/bitmap.h>
-#include <linux/bitops.h>
-#include <linux/bug.h>
-#include <linux/err.h>
-#include <linux/export.h>
-#include <linux/init.h>
-#include <linux/kernel.h>
-#include <linux/percpu.h>
-#include <linux/sched/signal.h>
-#include <linux/string.h>
-#include <linux/spinlock.h>
-#include <linux/percpu_ida.h>
-
-struct percpu_ida_cpu {
-	/*
-	 * Even though this is percpu, we need a lock for tag stealing by remote
-	 * CPUs:
-	 */
-	spinlock_t			lock;
-
-	/* nr_free/freelist form a stack of free IDs */
-	unsigned			nr_free;
-	unsigned			freelist[];
-};
-
-static inline void move_tags(unsigned *dst, unsigned *dst_nr,
-			     unsigned *src, unsigned *src_nr,
-			     unsigned nr)
-{
-	*src_nr -= nr;
-	memcpy(dst + *dst_nr, src + *src_nr, sizeof(unsigned) * nr);
-	*dst_nr += nr;
-}
-
-/*
- * Try to steal tags from a remote cpu's percpu freelist.
- *
- * We first check how many percpu freelists have tags
- *
- * Then we iterate through the cpus until we find some tags - we don't attempt
- * to find the "best" cpu to steal from, to keep cacheline bouncing to a
- * minimum.
- */
-static inline void steal_tags(struct percpu_ida *pool,
-			      struct percpu_ida_cpu *tags)
-{
-	unsigned cpus_have_tags, cpu = pool->cpu_last_stolen;
-	struct percpu_ida_cpu *remote;
-
-	for (cpus_have_tags = cpumask_weight(&pool->cpus_have_tags);
-	     cpus_have_tags; cpus_have_tags--) {
-		cpu = cpumask_next(cpu, &pool->cpus_have_tags);
-
-		if (cpu >= nr_cpu_ids) {
-			cpu = cpumask_first(&pool->cpus_have_tags);
-			if (cpu >= nr_cpu_ids)
-				BUG();
-		}
-
-		pool->cpu_last_stolen = cpu;
-		remote = per_cpu_ptr(pool->tag_cpu, cpu);
-
-		cpumask_clear_cpu(cpu, &pool->cpus_have_tags);
-
-		if (remote == tags)
-			continue;
-
-		spin_lock(&remote->lock);
-
-		if (remote->nr_free) {
-			memcpy(tags->freelist,
-			       remote->freelist,
-			       sizeof(unsigned) * remote->nr_free);
-
-			tags->nr_free = remote->nr_free;
-			remote->nr_free = 0;
-		}
-
-		spin_unlock(&remote->lock);
-
-		if (tags->nr_free)
-			break;
-	}
-}
-
-/*
- * Pop up to IDA_PCPU_BATCH_MOVE IDs off the global freelist, and push them onto
- * our percpu freelist:
- */
-static inline void alloc_global_tags(struct percpu_ida *pool,
-				     struct percpu_ida_cpu *tags)
-{
-	move_tags(tags->freelist, &tags->nr_free,
-		  pool->freelist, &pool->nr_free,
-		  min(pool->nr_free, pool->percpu_batch_size));
-}
-
-/**
- * percpu_ida_alloc - allocate a tag
- * @pool: pool to allocate from
- * @state: task state for prepare_to_wait
- *
- * Returns a tag - an integer in the range [0..nr_tags) (passed to
- * tag_pool_init()), or otherwise -ENOSPC on allocation failure.
- *
- * Safe to be called from interrupt context (assuming it isn't passed
- * TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE, of course).
- *
- * @gfp indicates whether or not to wait until a free id is available (it's not
- * used for internal memory allocations); thus if passed __GFP_RECLAIM we may sleep
- * however long it takes until another thread frees an id (same semantics as a
- * mempool).
- *
- * Will not fail if passed TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE.
- */
-int percpu_ida_alloc(struct percpu_ida *pool, int state)
-{
-	DEFINE_WAIT(wait);
-	struct percpu_ida_cpu *tags;
-	unsigned long flags;
-	int tag = -ENOSPC;
-
-	tags = raw_cpu_ptr(pool->tag_cpu);
-	spin_lock_irqsave(&tags->lock, flags);
-
-	/* Fastpath */
-	if (likely(tags->nr_free >= 0)) {
-		tag = tags->freelist[--tags->nr_free];
-		spin_unlock_irqrestore(&tags->lock, flags);
-		return tag;
-	}
-	spin_unlock_irqrestore(&tags->lock, flags);
-
-	while (1) {
-		spin_lock_irqsave(&pool->lock, flags);
-		tags = this_cpu_ptr(pool->tag_cpu);
-
-		/*
-		 * prepare_to_wait() must come before steal_tags(), in case
-		 * percpu_ida_free() on another cpu flips a bit in
-		 * cpus_have_tags
-		 *
-		 * global lock held and irqs disabled, don't need percpu lock
-		 */
-		if (state != TASK_RUNNING)
-			prepare_to_wait(&pool->wait, &wait, state);
-
-		if (!tags->nr_free)
-			alloc_global_tags(pool, tags);
-		if (!tags->nr_free)
-			steal_tags(pool, tags);
-
-		if (tags->nr_free) {
-			tag = tags->freelist[--tags->nr_free];
-			if (tags->nr_free)
-				cpumask_set_cpu(smp_processor_id(),
-						&pool->cpus_have_tags);
-		}
-
-		spin_unlock_irqrestore(&pool->lock, flags);
-
-		if (tag >= 0 || state == TASK_RUNNING)
-			break;
-
-		if (signal_pending_state(state, current)) {
-			tag = -ERESTARTSYS;
-			break;
-		}
-
-		schedule();
-	}
-	if (state != TASK_RUNNING)
-		finish_wait(&pool->wait, &wait);
-
-	return tag;
-}
-EXPORT_SYMBOL_GPL(percpu_ida_alloc);
-
-/**
- * percpu_ida_free - free a tag
- * @pool: pool @tag was allocated from
- * @tag: a tag previously allocated with percpu_ida_alloc()
- *
- * Safe to be called from interrupt context.
- */
-void percpu_ida_free(struct percpu_ida *pool, unsigned tag)
-{
-	struct percpu_ida_cpu *tags;
-	unsigned long flags;
-	unsigned nr_free;
-
-	BUG_ON(tag >= pool->nr_tags);
-
-	tags = raw_cpu_ptr(pool->tag_cpu);
-
-	spin_lock_irqsave(&tags->lock, flags);
-	tags->freelist[tags->nr_free++] = tag;
-
-	nr_free = tags->nr_free;
-
-	if (nr_free == 1) {
-		cpumask_set_cpu(smp_processor_id(),
-				&pool->cpus_have_tags);
-		wake_up(&pool->wait);
-	}
-	spin_unlock_irqrestore(&tags->lock, flags);
-
-	if (nr_free == pool->percpu_max_size) {
-		spin_lock_irqsave(&pool->lock, flags);
-		spin_lock(&tags->lock);
-
-		if (tags->nr_free == pool->percpu_max_size) {
-			move_tags(pool->freelist, &pool->nr_free,
-				  tags->freelist, &tags->nr_free,
-				  pool->percpu_batch_size);
-
-			wake_up(&pool->wait);
-		}
-		spin_unlock(&tags->lock);
-		spin_unlock_irqrestore(&pool->lock, flags);
-	}
-}
-EXPORT_SYMBOL_GPL(percpu_ida_free);
-
-/**
- * percpu_ida_destroy - release a tag pool's resources
- * @pool: pool to free
- *
- * Frees the resources allocated by percpu_ida_init().
- */
-void percpu_ida_destroy(struct percpu_ida *pool)
-{
-	free_percpu(pool->tag_cpu);
-	free_pages((unsigned long) pool->freelist,
-		   get_order(pool->nr_tags * sizeof(unsigned)));
-}
-EXPORT_SYMBOL_GPL(percpu_ida_destroy);
-
-/**
- * percpu_ida_init - initialize a percpu tag pool
- * @pool: pool to initialize
- * @nr_tags: number of tags that will be available for allocation
- *
- * Initializes @pool so that it can be used to allocate tags - integers in the
- * range [0, nr_tags). Typically, they'll be used by driver code to refer to a
- * preallocated array of tag structures.
- *
- * Allocation is percpu, but sharding is limited by nr_tags - for best
- * performance, the workload should not span more cpus than nr_tags / 128.
- */
-int __percpu_ida_init(struct percpu_ida *pool, unsigned long nr_tags,
-	unsigned long max_size, unsigned long batch_size)
-{
-	unsigned i, cpu, order;
-
-	memset(pool, 0, sizeof(*pool));
-
-	init_waitqueue_head(&pool->wait);
-	spin_lock_init(&pool->lock);
-	pool->nr_tags = nr_tags;
-	pool->percpu_max_size = max_size;
-	pool->percpu_batch_size = batch_size;
-
-	/* Guard against overflow */
-	if (nr_tags > (unsigned) INT_MAX + 1) {
-		pr_err("percpu_ida_init(): nr_tags too large\n");
-		return -EINVAL;
-	}
-
-	order = get_order(nr_tags * sizeof(unsigned));
-	pool->freelist = (void *) __get_free_pages(GFP_KERNEL, order);
-	if (!pool->freelist)
-		return -ENOMEM;
-
-	for (i = 0; i < nr_tags; i++)
-		pool->freelist[i] = i;
-
-	pool->nr_free = nr_tags;
-
-	pool->tag_cpu = __alloc_percpu(sizeof(struct percpu_ida_cpu) +
-				       pool->percpu_max_size * sizeof(unsigned),
-				       sizeof(unsigned));
-	if (!pool->tag_cpu)
-		goto err;
-
-	for_each_possible_cpu(cpu)
-		spin_lock_init(&per_cpu_ptr(pool->tag_cpu, cpu)->lock);
-
-	return 0;
-err:
-	percpu_ida_destroy(pool);
-	return -ENOMEM;
-}
-EXPORT_SYMBOL_GPL(__percpu_ida_init);
-
-/**
- * percpu_ida_for_each_free - iterate free ids of a pool
- * @pool: pool to iterate
- * @fn: interate callback function
- * @data: parameter for @fn
- *
- * Note, this doesn't guarantee to iterate all free ids restrictly. Some free
- * ids might be missed, some might be iterated duplicated, and some might
- * be iterated and not free soon.
- */
-int percpu_ida_for_each_free(struct percpu_ida *pool, percpu_ida_cb fn,
-	void *data)
-{
-	unsigned long flags;
-	struct percpu_ida_cpu *remote;
-	unsigned cpu, i, err = 0;
-
-	for_each_possible_cpu(cpu) {
-		remote = per_cpu_ptr(pool->tag_cpu, cpu);
-		spin_lock_irqsave(&remote->lock, flags);
-		for (i = 0; i < remote->nr_free; i++) {
-			err = fn(remote->freelist[i], data);
-			if (err)
-				break;
-		}
-		spin_unlock_irqrestore(&remote->lock, flags);
-		if (err)
-			goto out;
-	}
-
-	spin_lock_irqsave(&pool->lock, flags);
-	for (i = 0; i < pool->nr_free; i++) {
-		err = fn(pool->freelist[i], data);
-		if (err)
-			break;
-	}
-	spin_unlock_irqrestore(&pool->lock, flags);
-out:
-	return err;
-}
-EXPORT_SYMBOL_GPL(percpu_ida_for_each_free);
-
-/**
- * percpu_ida_free_tags - return free tags number of a specific cpu or global pool
- * @pool: pool related
- * @cpu: specific cpu or global pool if @cpu == nr_cpu_ids
- *
- * Note: this just returns a snapshot of free tags number.
- */
-unsigned percpu_ida_free_tags(struct percpu_ida *pool, int cpu)
-{
-	struct percpu_ida_cpu *remote;
-	if (cpu == nr_cpu_ids)
-		return pool->nr_free;
-	remote = per_cpu_ptr(pool->tag_cpu, cpu);
-	return remote->nr_free;
-}
-EXPORT_SYMBOL_GPL(percpu_ida_free_tags);
-- 
2.17.1


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox