Netdev List
 help / color / mirror / Atom feed
* The ubufs->refcount maybe be subtracted twice when tun_get_user failed
From: wangyunjian @ 2016-11-29  9:30 UTC (permalink / raw)
  To: mst@redhat.com, Jason Wang, netdev@vger.kernel.org; +Cc: caihe

In function tun_get_user , the ubufs->refcount may be subtracted twice, when msg_control is true and zerocopy is false.

About the below code:

static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
                                void *msg_control, struct iov_iter *from,
                                int noblock)
{
         ...

         if (zerocopy)
                   err = zerocopy_sg_from_iter(skb, from);
         else {
                   err = skb_copy_datagram_from_iter(skb, 0, from, len);
                   if (!err && msg_control) {
                            struct ubuf_info *uarg = msg_control;
                            uarg->callback(uarg, false);                       --> the ubufs->refcount is subtracted frist.
                   }
         }

         if (err) {
                   this_cpu_inc(tun->pcpu_stats->rx_dropped);
                   kfree_skb(skb);
                   return -EFAULT;
         }

         err = virtio_net_hdr_to_skb(skb, &gso, tun_is_little_endian(tun));
         if (err) {
                   this_cpu_inc(tun->pcpu_stats->rx_frame_errors);
                   kfree_skb(skb);
                   return -EINVAL;                                   -->here, the ubufs->refcount will be subtracted twice, when virtio_net_hdr_to_skb execution err.
         }

switch (tun->flags & TUN_TYPE_MASK) {
         case IFF_TUN:
                   if (tun->flags & IFF_NO_PI) {
                            switch (skb->data[0] & 0xf0) {
                            case 0x40:
                                     pi.proto = htons(ETH_P_IP);
                                     break;
                            case 0x60:
                                     pi.proto = htons(ETH_P_IPV6);
                                     break;
                            default:
                                     this_cpu_inc(tun->pcpu_stats->rx_dropped);
                                     kfree_skb(skb);
                                     return -EINVAL;                          --> this will also be subtracted twice.
                            }
                   }

                   skb_reset_mac_header(skb);
                   skb->protocol = pi.proto;
                   skb->dev = tun->dev;
                   break;
         case IFF_TAP:
                   skb->protocol = eth_type_trans(skb, tun->dev);
                   break;
         }
		...
}

^ permalink raw reply

* Re: [PATCH v2] ethernet :mellanox :mlx4: Replace pci_pool_alloc by pci_pool_zalloc
From: Sergei Shtylyov @ 2016-11-29  9:28 UTC (permalink / raw)
  To: Souptick Joarder, yishaih-VPRAkNaXOzVWk0Htik3J/w
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	sahu.rameshwar73-Re5JQEeQqe8AvxtiuMwx3w
In-Reply-To: <20161129065931.GA3245@gnr743-HP-ZBook-15>

Hello.

On 11/29/2016 9:59 AM, Souptick Joarder wrote:

> In mlx4_alloc_cmd_mailbox(), pci_pool_alloc() followed by memset will be
> replaced by pci_pool_zalloc().

    One more nit since you're going to send it again...

> Signed-off-by: Souptick joarder <jrdr.linux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> ---
> v2:
>   - Address comment from sergei
>     Alignment was not proper
>
>  drivers/net/ethernet/mellanox/mlx4/cmd.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c
> index e36bebc..96cdf9a 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
> @@ -2679,14 +2679,13 @@ struct mlx4_cmd_mailbox *mlx4_alloc_cmd_mailbox(struct mlx4_dev *dev)
>  	if (!mailbox)
>  		return ERR_PTR(-ENOMEM);
>
> -	mailbox->buf = pci_pool_alloc(mlx4_priv(dev)->cmd.pool, GFP_KERNEL,
> -				      &mailbox->dma);
> +	mailbox->buf = pci_pool_zalloc(mlx4_priv(dev)->cmd.pool, GFP_KERNEL,
> +				       &mailbox->dma);
>  	if (!mailbox->buf) {
>  		kfree(mailbox);
>  		return ERR_PTR(-ENOMEM);
>  	}
>
> -	memset(mailbox->buf, 0, MLX4_MAILBOX_SIZE);
>

    Remove empty line here -- one is enough.

>  	return mailbox;
>  }

MBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net V2] net/sched: pedit: make sure that offset is valid
From: zhuyj @ 2016-11-29  9:27 UTC (permalink / raw)
  To: Amir Vadai
  Cc: David S. Miller, netdev, Cong Wang, Jamal Hadi Salim, Or Gerlitz,
	Hadar Har-Zion, Jiri Pirko
In-Reply-To: <20161129071434.GA22491@office.localdomain>

 Thanks a lot.
When will offset become -1?

On Tue, Nov 29, 2016 at 3:14 PM, Amir Vadai <amir@vadai.me> wrote:
> On Tue, Nov 29, 2016 at 10:32:05AM +0800, zhuyj wrote:
>>  +       if (offset > 0 && offset > skb->len)
>>
>> offset > skb->len is enough?
> offset is signed and skb->len is unsigned. Therefore for example if
> offset=-1 and skb->len=10, the actual comparison is 0xff...>10
>
>>
>> On Mon, Nov 28, 2016 at 6:56 PM, Amir Vadai <amir@vadai.me> wrote:
>> > Add a validation function to make sure offset is valid:
>> > 1. Not below skb head (could happen when offset is negative).
>> > 2. Validate both 'offset' and 'at'.
>> >
>> > Signed-off-by: Amir Vadai <amir@vadai.me>
>> > ---
>> > Hi Dave,
>> >
>> > Please pull to -stable branches.
>> >
>> > Changes from V0:
>> > - Add a validation to the 'at' value (this is used as an offset too)
>> > - Instead of validating the output of skb_header_pointer(), make sure that the
>> >         offset is good before calling it.
>> >
>> > Thanks,
>> > Amir
>> >  net/sched/act_pedit.c | 24 ++++++++++++++++++++----
>> >  1 file changed, 20 insertions(+), 4 deletions(-)
>> >
>> > diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
>> > index b54d56d4959b..cf9b2fe8eac6 100644
>> > --- a/net/sched/act_pedit.c
>> > +++ b/net/sched/act_pedit.c
>> > @@ -108,6 +108,17 @@ static void tcf_pedit_cleanup(struct tc_action *a, int bind)
>> >         kfree(keys);
>> >  }
>> >
>> > +static bool offset_valid(struct sk_buff *skb, int offset)
>> > +{
>> > +       if (offset > 0 && offset > skb->len)
>> > +               return false;
>> > +
>> > +       if  (offset < 0 && -offset > skb_headroom(skb))
>> > +               return false;
>> > +
>> > +       return true;
>> > +}
>> > +
>> >  static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
>> >                      struct tcf_result *res)
>> >  {
>> > @@ -134,6 +145,11 @@ static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
>> >                         if (tkey->offmask) {
>> >                                 char *d, _d;
>> >
>> > +                               if (!offset_valid(skb, off + tkey->at)) {
>> > +                                       pr_info("tc filter pedit 'at' offset %d out of bounds\n",
>> > +                                               off + tkey->at);
>> > +                                       goto bad;
>> > +                               }
>> >                                 d = skb_header_pointer(skb, off + tkey->at, 1,
>> >                                                        &_d);
>> >                                 if (!d)
>> > @@ -146,10 +162,10 @@ static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
>> >                                         " offset must be on 32 bit boundaries\n");
>> >                                 goto bad;
>> >                         }
>> > -                       if (offset > 0 && offset > skb->len) {
>> > -                               pr_info("tc filter pedit"
>> > -                                       " offset %d can't exceed pkt length %d\n",
>> > -                                      offset, skb->len);
>> > +
>> > +                       if (!offset_valid(skb, off + offset)) {
>> > +                               pr_info("tc filter pedit offset %d out of bounds\n",
>> > +                                       offset);
>> >                                 goto bad;
>> >                         }
>> >
>> > --
>> > 2.10.2
>> >

^ permalink raw reply

* RE: [PATCH net-next] net: hns: Fix to conditionally convey RX checksum flag to stack
From: Salil Mehta @ 2016-11-29  9:13 UTC (permalink / raw)
  To: David Miller
  Cc: Zhuangyuzeng (Yisen), mehta.salil.lnk@gmail.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Linuxarm
In-Reply-To: <20161128.121240.1321057221950786765.davem@davemloft.net>

> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Monday, November 28, 2016 5:13 PM
> To: Salil Mehta
> Cc: Zhuangyuzeng (Yisen); mehta.salil.lnk@gmail.com;
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Linuxarm
> Subject: Re: [PATCH net-next] net: hns: Fix to conditionally convey RX
> checksum flag to stack
> 
> From: Salil Mehta <salil.mehta@huawei.com>
> Date: Fri, 25 Nov 2016 13:32:40 +0000
> 
> > @@ -778,6 +778,35 @@ int hns_ae_get_regs_len(struct hnae_handle
> *handle)
> >  	return total_num;
> >  }
> >
> > +static bool hns_ae_is_l3l4_csum_err(struct hnae_handle *handle)
> > +{
> > +	struct hns_ppe_cb *ppe_cb = hns_get_ppe_cb(handle);
> > +	u32 regval;
> > +	bool retval = false;
> > +
> > +	/* read PPE_HIS_PRO_ERR register and check for the checksum
> errors */
> > +	regval = dsaf_read_dev(ppe_cb, PPE_HIS_PRO_ERR_REG);
> > +
> 
> I don't see how a single register can properly provide error status for
> a ring
> of pending received packets.
> 
> No matter how this register is implemented, it is either going to
> result in
> packets erroneously being marked as having errors, or error status
> being
> lost when multiple packets in a row have such errors.
> 
> For example, if you receive several packets in a row that have errors,
> you'll read this register for the first one.  If this read clears the
> error
> status, which I am guessing it does, then you won't see the error
> status
> for the next packet that had one of these errors as well.
Agreed David. I think I missed this part. This register is 
not well thought of and looks useless for checksum. Thanks
for identifying this!

> 
> If you don't have something which is provided on a per-packet basis
> then you can't determine the error properly.  Therefore you will just
> have to always ignore the checksum if there is any error indicated in
> the ring descriptor.
Yes, will float another patch ignoring the checksum.

Thanks
Salil 

^ permalink raw reply

* Re: [PATCH v2 11/13] clocksource: export the clocks_calc_mult_shift to use by timestamp code
From: Thomas Gleixner @ 2016-11-29  9:08 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: David S. Miller, netdev-u79uwXL29TY76Z2rM5mHXA, Mugunthan V N,
	Richard Cochran, Sekhar Nori, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-omap-u79uwXL29TY76Z2rM5mHXA, Rob Herring,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Murali Karicheri, Wingman Kwok,
	John Stultz
In-Reply-To: <20161128230337.6731-12-grygorii.strashko-l0cyMroinI0@public.gmane.org>

On Mon, 28 Nov 2016, Grygorii Strashko wrote:

> From: Murali Karicheri <m-karicheri2-l0cyMroinI0@public.gmane.org>
> 
> The CPSW CPTS driver is capable of doing timestamping on tx/rx packets and
> requires to know mult and shift factors for timestamp conversion from raw
> value to nanoseconds (ptp clock). Now these mult and shift factors are
> calculated manually and provided through DT, which makes very hard to
> support of a lot number of platforms, especially if CPTS refclk is not the
> same for some kind of boards and depends on efuse settings (Keystone 2
> platforms). Hence, export clocks_calc_mult_shift() to allow drivers like
> CPSW CPTS (and other ptp drivesr) to benefit from automaitc calculation of
> mult and shift factors.
> 
> Cc: John Stultz <john.stultz-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
> Cc: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
> Signed-off-by: Murali Karicheri <m-karicheri2-l0cyMroinI0@public.gmane.org>
> Signed-off-by: Grygorii Strashko <grygorii.strashko-l0cyMroinI0@public.gmane.org>

Acked-by: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>

> ---
>  kernel/time/clocksource.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index 7e4fad7..150242c 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -89,6 +89,7 @@ clocks_calc_mult_shift(u32 *mult, u32 *shift, u32 from, u32 to, u32 maxsec)
>  	*mult = tmp;
>  	*shift = sft;
>  }
> +EXPORT_SYMBOL_GPL(clocks_calc_mult_shift);
>  
>  /*[Clocksource internal variables]---------
>   * curr_clocksource:
> -- 
> 2.10.1
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [net PATCH 0/2] Don't use lco_csum to compute IPv4 checksum
From: Stephen Rothwell @ 2016-11-29  9:07 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: netdev, intel-wired-lan, Alexander Duyck, davem, Eric Dumazet,
	Eli Cooper, Lance Richardson, Sven-Haegar Koch
In-Reply-To: <20161129104253.00ada847@canb.auug.org.au>

Hi Jeff,

On Tue, 29 Nov 2016 10:43:04 +1100 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> On Mon, 28 Nov 2016 14:26:02 -0800 Jeff Kirsher <jeffrey.t.kirsher@intel.com> wrote:
> >
> > On Mon, 2016-11-28 at 10:42 -0500, Alexander Duyck wrote:  
> > > When I implemented the GSO partial support in the Intel drivers I was
> > > using
> > > lco_csum to compute the checksum that we needed to plug into the IPv4
> > > checksum field in order to cancel out the data that was not a part of the
> > > IPv4 header.  However this didn't take into account that the transport
> > > offset might be pointing to the inner transport header.
> > > 
> > > Instead of using lco_csum I have just coded around it so that we can use
> > > the outer IP header plus the IP header length to determine where we need
> > > to
> > > start our checksum and then just call csum_partial ourselves.
> > > 
> > > This should fix the SIT issue reported on igb interfaces as well as
> > > simliar
> > > issues that would pop up on other Intel NICs.
> > > 
> > > ---
> > > 
> > > Alexander Duyck (2):
> > >       igb/igbvf: Don't use lco_csum to compute IPv4 checksum
> > >       ixgbe/ixgbevf: Don't use lco_csum to compute IPv4 checksum    
> > 
> > Stephen, I have applied Alex's patches to my net-queue tree.  Can you
> > confirm they resolve the bug seen?  
> 
> Its a bit tricky because the origin problem only happens on my
> production server (ozlabs.org), but I will see if I can manage to just
> remove and reload the driver ...  though, the server is running a 4.7.8
> kernel and I am wondering how well these patches will apply?

We have a winner!  This fixes my problem, so I can run at full speed
with gso and tso enabled in the sit interface and tx-gso-partial
enabled on the underlying ethernet.

Thanks to everyone for diagnosis and solution.

It would be nice if this fix went into the stable kernels as well so it
will turn up in the distro kernels eventually.
-- 
Cheers,
Stephen Rothwell

^ permalink raw reply

* Re: linux-next: manual merge of the net-next tree with the net tree
From: Borislav Petkov @ 2016-11-29  9:01 UTC (permalink / raw)
  To: Stephen Rothwell, David Miller
  Cc: Networking, linux-next, linux-kernel, Tom Lendacky
In-Reply-To: <20161129112232.333d3363@canb.auug.org.au>

On Tue, Nov 29, 2016 at 11:22:32AM +1100, Stephen Rothwell wrote:
> Hi all,
> 
> Today's linux-next merge of the net-next tree got a conflict in:
> 
>   drivers/net/ethernet/amd/xgbe/xgbe-main.c
> 
> between commit:
> 
>   91eefaabf102 ("amd-xgbe: Fix unused suspend handlers build warning")
> 
> from the net tree and commit:
> 
>   bd8255d8ba35 ("amd-xgbe: Prepare for supporting PCI devices")
> 
> from the net-next tree.
> 
> I fixed it up (the latter removed the code modified by the former)

... except that +#ifdef CONFIG_PM is present in the new
drivers/net/ethernet/amd/xgbe/xgbe-platform.c now.

So actually the proper fix is, IMO, to convert:

+#ifdef CONFIG_PM
+static int xgbe_platform_suspend(struct device *dev)

to

+#ifdef CONFIG_PM_SLEEP
+static int xgbe_platform_suspend(struct device *dev)

so that it doesn't fire again.

David, would you prefer a patch against linux-next?

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply

* Re: [PATCH v2] net: macb: Write only necessary bits in NCR in macb reset
From: Nicolas Ferre @ 2016-11-29  8:57 UTC (permalink / raw)
  To: Harini Katakam, davem, harinikatakamlinux
  Cc: netdev, linux-kernel, harinik, michals
In-Reply-To: <1480398985-24037-1-git-send-email-harinik@xilinx.com>

Le 29/11/2016 à 06:56, Harini Katakam a écrit :
> In macb_reset_hw, use read-modify-write to disable RX and TX.
> Existing settings, for ex. management port enable,
> are being cleared in the current implementation.
> Also certain reserved bits are read only.
> Hence it is better to use read-modify-write.
> Use the same method for clearing statistics as well.
> 
> Signed-off-by: Harini Katakam <harinik@xilinx.com>

Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>

> ---
> 
> v2:
> Make ctrl type as u32
> Improve commit description
> 
> ---
>  drivers/net/ethernet/cadence/macb.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c
> index 0e489bb..2ce3407 100644
> --- a/drivers/net/ethernet/cadence/macb.c
> +++ b/drivers/net/ethernet/cadence/macb.c
> @@ -1744,14 +1744,18 @@ static void macb_reset_hw(struct macb *bp)
>  {
>  	struct macb_queue *queue;
>  	unsigned int q;
> +	u32 ctrl;
>  
>  	/* Disable RX and TX (XXX: Should we halt the transmission
>  	 * more gracefully?)
>  	 */
> -	macb_writel(bp, NCR, 0);
> +	ctrl = macb_readl(bp, NCR);
> +	ctrl &= ~(MACB_BIT(RE) | MACB_BIT(TE));
> +	macb_writel(bp, NCR, ctrl);
>  
>  	/* Clear the stats registers (XXX: Update stats first?) */
> -	macb_writel(bp, NCR, MACB_BIT(CLRSTAT));
> +	ctrl |= MACB_BIT(CLRSTAT);
> +	macb_writel(bp, NCR, ctrl);
>  
>  	/* Clear all status flags */
>  	macb_writel(bp, TSR, -1);
> 


-- 
Nicolas Ferre

^ permalink raw reply

* RE: [PATCH v3 3/5] net: asix: Fix AX88772x resume failures
From: ASIX_Allan [Office] @ 2016-11-29  8:54 UTC (permalink / raw)
  To: 'Jon Hunter', robert.foss-ZGY8ohtN/8qB+jHODAdFcQ,
	freddy-knRN6Y/kmf1NUHwG+Fw1Kw,
	Dean_Jenkins-nmGgyN9QBj3QT0dZR+AlfA,
	Mark_Craske-nmGgyN9QBj3QT0dZR+AlfA, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	ivecera-H+wXaHxf7aLQT0dZR+AlfA,
	john.stultz-QSEj5FYQhm4dnm+yROfE0A,
	vpalatin-F7+t8E8rja9g9hUCZPvPmw,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	grundler-F7+t8E8rja9g9hUCZPvPmw,
	changchias-Re5JQEeQqe8AvxtiuMwx3w, andrew-g2DYL2Zd6BY,
	tremyfr-Re5JQEeQqe8AvxtiuMwx3w, colin.king-Z7WLFzj8eWMS+FvcfC7Uqw,
	linux-usb-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	vpalatin-hpIqsD4AKlfQT0dZR+AlfA
In-Reply-To: <6aebd7f5-188a-f6b0-7eb0-75b764e069d3-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>

Dear Jon ,

We can reproduce your issue on x86 Linux kernel 4.9.0-rc system in our site
and modified the following code can fix this issue. Please let us know if
you still have problems. Thanks a lot.

static void ax88772_suspend(struct usbnet *dev)
{
        struct asix_common_private *priv = dev->driver_priv;
        u16 medium;

        /* Stop MAC operation */
-       medium = asix_read_medium_status(dev, 0);
+      medium = asix_read_medium_status(dev, 1);
        medium &= ~AX_MEDIUM_RE;
-       asix_write_medium_mode(dev, medium, 0);
+      asix_write_medium_mode(dev, medium, 1);

        netdev_dbg(dev->net, "ax88772_suspend: medium=0x%04x\n",
-                  asix_read_medium_status(dev, 0));
+                 asix_read_medium_status(dev, 1));

        /* Preserve BMCR for restoring */
        priv->presvd_phy_bmcr =
                asix_mdio_read_nopm(dev->net, dev->mii.phy_id, MII_BMCR);

        /* Preserve ANAR for restoring */
        priv->presvd_phy_advertise =
                asix_mdio_read_nopm(dev->net, dev->mii.phy_id,
MII_ADVERTISE);
} 


---
Best regards,
Allan Chou
Technical Support Division
ASIX Electronics Corporation
TEL: 886-3-5799500 ext.228
FAX: 886-3-5799558
E-mail: allan-knRN6Y/kmf1NUHwG+Fw1Kw@public.gmane.org 
http://www.asix.com.tw/ 


-----Original Message-----
From: Jon Hunter [mailto:jonathanh-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org] 
Sent: Tuesday, November 22, 2016 11:34 PM
To: allan-knRN6Y/kmf1NUHwG+Fw1Kw@public.gmane.org; robert.foss-ZGY8ohtN/8qB+jHODAdFcQ@public.gmane.org; freddy-knRN6Y/kmf1NUHwG+Fw1Kw@public.gmane.org;
Dean_Jenkins-nmGgyN9QBj3QT0dZR+AlfA@public.gmane.org; Mark_Craske-nmGgyN9QBj3QT0dZR+AlfA@public.gmane.org; davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org;
ivecera-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org; john.stultz-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org; vpalatin-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org;
stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ@public.gmane.org; grundler-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org; changchias-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org;
andrew-g2DYL2Zd6BY@public.gmane.org; tremyfr-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; colin.king-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org;
linux-usb-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org;
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; vpalatin-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org
Subject: Re: [PATCH v3 3/5] net: asix: Fix AX88772x resume failures

Hi Allan,

On 18/11/16 15:09, Jon Hunter wrote:
> Hi Allan,
> 
> On 14/11/16 09:45, ASIX_Allan [Office] wrote:
>> Hi Jon,
>>
>> Please help to double check if the USB host controller of your Terga 
>> platform had been powered OFF while running the ax88772_suspend() 
>> routine or not?
> 
> Sorry for the delay. Today I set up a local board to reproduce this on 
> and was able to recreate the same problem. The Tegra xhci driver does 
> not power off during suspend and simply calls xhci_suspend(). I also 
> checked vbus to see if it was turning off but it is not. Furthermore I 
> don't see a new USB device detected after the error and so I don't see 
> any evidence that it ever disconnects.

In an attempt to isolate if this is a Tegra issue or not, I recompiled
v4.9-rc6 for x86 and I was able to reproduce the problem on my desktop ...

[  256.030060] PM: Syncing filesystems ... done.
[  256.113925] PM: Preparing system for sleep (mem) [  256.114119] Freezing
user space processes ... (elapsed 0.002 seconds) done.
[  256.116701] Freezing remaining freezable tasks ... (elapsed 0.001
seconds) done.
[  256.118041] PM: Suspending system (mem) [  256.118058] Suspending
console(s) (use no_console_suspend to debug) [  256.118324] asix 1-1.2:1.0
eth2: Failed to read reg index 0x0000: -19 [  256.118327] asix 1-1.2:1.0
eth2: Error reading Medium Status register: ffffffed [  256.118329] asix
1-1.2:1.0 eth2: Failed to write reg index 0x0000: -19 [  256.118332] asix
1-1.2:1.0 eth2: Failed to write Medium Mode mode to 0xfeed: ffffffed [
256.118374] sd 0:0:0:0: [sda] Synchronizing SCSI cache [  256.118471] sd
0:0:0:0: [sda] Stopping disk [  256.152992] hpet1: lost 1 rtc interrupts [
256.153893] serial 00:06: disabled [  256.153899] serial 00:06: System
wakeup disabled by ACPI [  256.154068] e1000e: EEE TX LPI TIMER: 00000011 [
256.628281] PM: suspend of devices complete after 509.782 msecs [
256.628620] PM: late suspend of devices complete after 0.336 msecs [
256.629366] ehci-pci 0000:00:1d.0: System wakeup enabled by ACPI [
256.629595] tg3 0000:03:00.0: System wakeup enabled by ACPI [  256.629601]
ehci-pci 0000:00:1a.0: System wakeup enabled by ACPI [  256.629652] e1000e
0000:00:19.0: System wakeup enabled by ACPI [  256.629812] xhci_hcd
0000:00:14.0: System wakeup enabled by ACPI [  256.648347] PM: noirq suspend
of devices complete after 19.713 msecs [  256.648685] ACPI: Preparing to
enter system sleep state S3 [  256.668275] PM: Saving platform NVS memory [
256.668283] Disabling non-boot CPUs ...

To reproduce this, I did the following:

1. Connect the asix device and noted the net interface (ie. eth2) 2.
Disabled the interface (ie. sudo ifconfig eth2 down) 3. Ran a suspend-resume
cycle using rtcwake (eg. sudo rtcwake -d rtc0 -m mem -s 5)

Cheers
Jon

--
nvpublic

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2] ethernet :mellanox :mlx4: Replace pci_pool_alloc by pci_pool_zalloc
From: Souptick Joarder @ 2016-11-29  8:25 UTC (permalink / raw)
  To: Sergei Shtylyov, yishaih-VPRAkNaXOzVWk0Htik3J/w
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Rameshwar Sahu
In-Reply-To: <CAFqt6zYzUyZ__7iVH3LwhGzeU7RN59-md4rcmwngRZW_ahDeAQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 1847 bytes --]

While sending the patch using "mutt" , alignment is getting changed.
In original patch file alignment is proper and
I have tested the patch with "checkpatch.pl". Attach is the original patch file.

I am using mutt -H < patch-file>

Any suggestion will be helpful?

On Tue, Nov 29, 2016 at 12:55 PM, Souptick Joarder <jrdr.linux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Please ignore this v2 patch.
>
> On Tue, Nov 29, 2016 at 12:29 PM, Souptick Joarder <jrdr.linux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> In mlx4_alloc_cmd_mailbox(), pci_pool_alloc() followed by memset will be
>> replaced by pci_pool_zalloc().
>>
>> Signed-off-by: Souptick joarder <jrdr.linux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>> ---
>> v2:
>>   - Address comment from sergei
>>     Alignment was not proper
>>
>>  drivers/net/ethernet/mellanox/mlx4/cmd.c | 5 ++---
>>  1 file changed, 2 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c
>> index e36bebc..96cdf9a 100644
>> --- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
>> +++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
>> @@ -2679,14 +2679,13 @@ struct mlx4_cmd_mailbox *mlx4_alloc_cmd_mailbox(struct mlx4_dev *dev)
>>         if (!mailbox)
>>                 return ERR_PTR(-ENOMEM);
>>
>> -       mailbox->buf = pci_pool_alloc(mlx4_priv(dev)->cmd.pool, GFP_KERNEL,
>> -                                     &mailbox->dma);
>> +       mailbox->buf = pci_pool_zalloc(mlx4_priv(dev)->cmd.pool, GFP_KERNEL,
>> +                                      &mailbox->dma);
>>         if (!mailbox->buf) {
>>                 kfree(mailbox);
>>                 return ERR_PTR(-ENOMEM);
>>         }
>>
>> -       memset(mailbox->buf, 0, MLX4_MAILBOX_SIZE);
>>
>>         return mailbox;
>>  }
>> --
>> 1.9.1
>>

Regards
Souptick

[-- Attachment #2: 0001-ethernet-mellanox-mlx4-Replace-pci_pool_alloc-by-pci.patch --]
[-- Type: text/x-patch, Size: 1290 bytes --]

From 3621eac92d271332ecc95b6f09ce25b7c6846137 Mon Sep 17 00:00:00 2001
From: Souptick Joarder <jrdr.linux@gmail.com>
Date: Tue, 29 Nov 2016 12:15:39 +0530
Subject: [PATCH v2] ethernet :mellanox :mlx4: Replace pci_pool_alloc by
 pci_pool_zalloc

In mlx4_alloc_cmd_mailbox(), pci_pool_alloc() followed by memset will be
replaced by pci_pool_zalloc().

Signed-off-by: Souptick joarder <jrdr.linux@gmail.com>
---
v2:
  - Address comment from sergei
    Alignment was not proper

 drivers/net/ethernet/mellanox/mlx4/cmd.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c
index e36bebc..96cdf9a 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
@@ -2679,14 +2679,13 @@ struct mlx4_cmd_mailbox *mlx4_alloc_cmd_mailbox(struct mlx4_dev *dev)
 	if (!mailbox)
 		return ERR_PTR(-ENOMEM);
 
-	mailbox->buf = pci_pool_alloc(mlx4_priv(dev)->cmd.pool, GFP_KERNEL,
-				      &mailbox->dma);
+	mailbox->buf = pci_pool_zalloc(mlx4_priv(dev)->cmd.pool, GFP_KERNEL,
+				       &mailbox->dma);
 	if (!mailbox->buf) {
 		kfree(mailbox);
 		return ERR_PTR(-ENOMEM);
 	}
 
-	memset(mailbox->buf, 0, MLX4_MAILBOX_SIZE);
 
 	return mailbox;
 }
-- 
1.9.1


^ permalink raw reply related

* [net-next] neigh: remove duplicate check for same neigh
From: Zhang Shengju @ 2016-11-29  8:22 UTC (permalink / raw)
  To: netdev, dsa

Currently loop index 'idx' is used as the index in the neigh list of interest. 
It's increased only when the neigh is dumped. It's not the absolute index in 
the list. Because there is no info to record which neigh has already be scanned 
by previous loop. This will cause the filtered out neighs to be scanned mulitple 
times. 

This patch make idx as the absolute index in the list, it will increase no matter
whether the neigh is filtered. This will prevent the above problem.

And this is in line with other dump functions.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
---
 net/core/neighbour.c | 39 ++++++++++++++++++---------------------
 1 file changed, 18 insertions(+), 21 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 2ae929f..ce32e9c 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -2256,6 +2256,16 @@ static bool neigh_ifindex_filtered(struct net_device *dev, int filter_idx)
 	return false;
 }
 
+static bool neigh_dump_filtered(struct net_device *dev, int filter_idx,
+		int filter_master_idx)
+{
+	if (neigh_ifindex_filtered(dev, filter_idx) ||
+	    neigh_master_filtered(dev, filter_master_idx))
+		return true;
+
+	return false;
+}
+
 static int neigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb,
 			    struct netlink_callback *cb)
 {
@@ -2285,20 +2295,15 @@ static int neigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb,
 	rcu_read_lock_bh();
 	nht = rcu_dereference_bh(tbl->nht);
 
-	for (h = s_h; h < (1 << nht->hash_shift); h++) {
-		if (h > s_h)
-			s_idx = 0;
+	for (h = s_h; h < (1 << nht->hash_shift); h++, s_idx = 0) {
 		for (n = rcu_dereference_bh(nht->hash_buckets[h]), idx = 0;
 		     n != NULL;
-		     n = rcu_dereference_bh(n->next)) {
-			if (!net_eq(dev_net(n->dev), net))
-				continue;
-			if (neigh_ifindex_filtered(n->dev, filter_idx))
+		     n = rcu_dereference_bh(n->next), idx++) {
+			if (idx < s_idx || !net_eq(dev_net(n->dev), net))
 				continue;
-			if (neigh_master_filtered(n->dev, filter_master_idx))
+			if (neigh_dump_filtered(n->dev, filter_idx,
+						filter_master_idx))
 				continue;
-			if (idx < s_idx)
-				goto next;
 			if (neigh_fill_info(skb, n, NETLINK_CB(cb->skb).portid,
 					    cb->nlh->nlmsg_seq,
 					    RTM_NEWNEIGH,
@@ -2306,8 +2311,6 @@ static int neigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb,
 				rc = -1;
 				goto out;
 			}
-next:
-			idx++;
 		}
 	}
 	rc = skb->len;
@@ -2328,14 +2331,10 @@ static int pneigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb,
 
 	read_lock_bh(&tbl->lock);
 
-	for (h = s_h; h <= PNEIGH_HASHMASK; h++) {
-		if (h > s_h)
-			s_idx = 0;
-		for (n = tbl->phash_buckets[h], idx = 0; n; n = n->next) {
-			if (pneigh_net(n) != net)
+	for (h = s_h; h <= PNEIGH_HASHMASK; h++, s_idx = 0) {
+		for (n = tbl->phash_buckets[h], idx = 0; n; n = n->next, idx++) {
+			if (idx < s_idx || pneigh_net(n) != net)
 				continue;
-			if (idx < s_idx)
-				goto next;
 			if (pneigh_fill_info(skb, n, NETLINK_CB(cb->skb).portid,
 					    cb->nlh->nlmsg_seq,
 					    RTM_NEWNEIGH,
@@ -2344,8 +2343,6 @@ static int pneigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb,
 				rc = -1;
 				goto out;
 			}
-		next:
-			idx++;
 		}
 	}
 
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH v2] vxlan: fix a potential issue when create a new vxlan fdb entry.
From: Jiri Benc @ 2016-11-29  8:20 UTC (permalink / raw)
  To: Haishuang Yan
  Cc: David S. Miller, Hannes Frederic Sowa, Pravin B Shelar, netdev,
	linux-kernel
In-Reply-To: <1480384776-8252-1-git-send-email-yanhaishuang@cmss.chinamobile.com>

On Tue, 29 Nov 2016 09:59:36 +0800, Haishuang Yan wrote:
> vxlan_fdb_append may return error, so add the proper check,
> otherwise it will cause memory leak.
> 
> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
> 
> Changes in v2:
>   - Unnecessary to initialize rc to zero.

Acked-by: Jiri Benc <jbenc@redhat.com>

^ permalink raw reply

* Re: [PATCH net] openvswitch: Fix skb leak in IPv6 reassembly.
From: Pravin Shelar @ 2016-11-29  7:39 UTC (permalink / raw)
  To: Daniele Di Proietto
  Cc: Linux Kernel Network Developers, Florian Westphal, Joe Stringer
In-Reply-To: <20161128234353.4262-1-diproiettod@ovn.org>

On Mon, Nov 28, 2016 at 3:43 PM, Daniele Di Proietto
<diproiettod@ovn.org> wrote:
> If nf_ct_frag6_gather() returns an error other than -EINPROGRESS, it
> means that we still have a reference to the skb.  We should free it
> before returning from handle_fragments, as stated in the comment above.
>
> Fixes: daaa7d647f81 ("netfilter: ipv6: avoid nf_iterate recursion")
> CC: Florian Westphal <fw@strlen.de>
> CC: Pravin B Shelar <pshelar@ovn.org>
> CC: Joe Stringer <joe@ovn.org>
> Signed-off-by: Daniele Di Proietto <diproiettod@ovn.org>
> ---
>  net/openvswitch/conntrack.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> index 31045ef..fecefa2 100644
> --- a/net/openvswitch/conntrack.c
> +++ b/net/openvswitch/conntrack.c
> @@ -370,8 +370,11 @@ static int handle_fragments(struct net *net, struct sw_flow_key *key,
>                 skb_orphan(skb);
>                 memset(IP6CB(skb), 0, sizeof(struct inet6_skb_parm));
>                 err = nf_ct_frag6_gather(net, skb, user);
> -               if (err)
> +               if (err) {
> +                       if (err != -EINPROGRESS)
> +                               kfree_skb(skb);
>                         return err;
> +               }
>

This fixes the code. But the patch is adding yet another skb-kfree in
conntrack code. we could simplify it by reusing error handling in
do_execute_actions().
If you think that is too complicated for stable branch, I am fine with
this patch going in as it is.

^ permalink raw reply

* RE: [PATCH] net: brocade: bna: use new api ethtool_{get|set}_link_ksettings
From: Mody, Rasesh @ 2016-11-29  7:37 UTC (permalink / raw)
  To: Philippe Reynes, Kalluru, Sudarsana, Dept-GE Linux NIC Dev,
	davem@davemloft.net
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <1480373539-3257-1-git-send-email-tremyfr@gmail.com>

> From: Philippe Reynes [mailto:tremyfr@gmail.com]
> Sent: Monday, November 28, 2016 2:52 PM
> 
> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
> 
> Signed-off-by: Philippe Reynes <tremyfr@gmail.com>

Acked-by: Rasesh Mody <Rasesh.Mody@cavium.com> 

> ---
>  drivers/net/ethernet/brocade/bna/bnad_ethtool.c |   54 +++++++++++++--
> --------
>  1 files changed, 30 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/net/ethernet/brocade/bna/bnad_ethtool.c
> b/drivers/net/ethernet/brocade/bna/bnad_ethtool.c
> index 31f61a7..2865939 100644
> --- a/drivers/net/ethernet/brocade/bna/bnad_ethtool.c
> +++ b/drivers/net/ethernet/brocade/bna/bnad_ethtool.c

^ permalink raw reply

* Re: [PATCH v2] ethernet :mellanox :mlx4: Replace pci_pool_alloc by pci_pool_zalloc
From: Souptick Joarder @ 2016-11-29  7:25 UTC (permalink / raw)
  To: Sergei Shtylyov, yishaih-VPRAkNaXOzVWk0Htik3J/w
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Rameshwar Sahu
In-Reply-To: <20161129065931.GA3245@gnr743-HP-ZBook-15>

Please ignore this v2 patch.

On Tue, Nov 29, 2016 at 12:29 PM, Souptick Joarder <jrdr.linux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> In mlx4_alloc_cmd_mailbox(), pci_pool_alloc() followed by memset will be
> replaced by pci_pool_zalloc().
>
> Signed-off-by: Souptick joarder <jrdr.linux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> ---
> v2:
>   - Address comment from sergei
>     Alignment was not proper
>
>  drivers/net/ethernet/mellanox/mlx4/cmd.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c
> index e36bebc..96cdf9a 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
> @@ -2679,14 +2679,13 @@ struct mlx4_cmd_mailbox *mlx4_alloc_cmd_mailbox(struct mlx4_dev *dev)
>         if (!mailbox)
>                 return ERR_PTR(-ENOMEM);
>
> -       mailbox->buf = pci_pool_alloc(mlx4_priv(dev)->cmd.pool, GFP_KERNEL,
> -                                     &mailbox->dma);
> +       mailbox->buf = pci_pool_zalloc(mlx4_priv(dev)->cmd.pool, GFP_KERNEL,
> +                                      &mailbox->dma);
>         if (!mailbox->buf) {
>                 kfree(mailbox);
>                 return ERR_PTR(-ENOMEM);
>         }
>
> -       memset(mailbox->buf, 0, MLX4_MAILBOX_SIZE);
>
>         return mailbox;
>  }
> --
> 1.9.1
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: bnx2 breaks Dell R815 BMC IPMI since 4.8
From: Brice Goglin @ 2016-11-29  7:21 UTC (permalink / raw)
  To: Baoquan He; +Cc: Linux Network Development list
In-Reply-To: <20161129070211.GC3126@x1>

I only tested 4.8.5 and 4.9-rc5 unfortunately, they came later. I'll
ping my distro.
Thanks for the quick reply!
Brice



Le 29/11/2016 08:02, Baoquan He a écrit :
> Sorry, Brice. This has been reported by people, and it has been fixed by
> later post. The commits within linus's tree are:
>
> commit 6df77862f63f389df3b1ad879738e04440d7385d
> Author: Baoquan He <bhe@redhat.com>
> Date:   Sun Nov 13 13:01:33 2016 +0800
>
>     bnx2: Wait for in-flight DMA to complete at probe stage
>
> commit 5d0d4b91bf627f14f95167b738d524156c9d440b
> Author: Baoquan He <bhe@redhat.com>
> Date:   Sun Nov 13 13:01:32 2016 +0800
>
>     Revert "bnx2: Reset device during driver initialization"
>     
>     This reverts commit 3e1be7ad2d38c6bd6aeef96df9bd0a7822f4e51c.
>
> And I believe both of them also are picked up into 4.8-stable kernel.
> Please have a way to get them.
>
> Sorry again!
>
> Thanks
> Baoquan
>
>
> On 11/29/16 at 07:57am, Brice Goglin wrote:
>> Hello
>>
>> My Dell PowerEdge R815 doesn't have IPMI anymore when I boot a 4.8
>> kernel, the BMC doesn't even ping anymore. Its Ethernet devices are 4 of
>> those:
>>
>> 01:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
>> 	DeviceName: Embedded NIC 1                          
>> 	Subsystem: Dell NetXtreme II BCM5709 Gigabit Ethernet
>> 	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
>> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>> 	Latency: 0, Cache Line Size: 64 bytes
>> 	Interrupt: pin A routed to IRQ 42
>> 	Region 0: Memory at e6000000 (64-bit, non-prefetchable) [size=32M]
>> 	Capabilities: <access denied>
>> 	Kernel driver in use: bnx2
>> 	Kernel modules: bnx2
>>
>> The only change in bnx2 between 4.7 and 4.8 appears to be this one:
>>
>> commit 3e1be7ad2d38c6bd6aeef96df9bd0a7822f4e51c
>> Author: Baoquan He <bhe@redhat.com>
>> Date:   Fri Sep 9 22:43:12 2016 +0800
>>
>>     bnx2: Reset device during driver initialization
>>
>> Could you patch actually break the BMC? What do I need to further debug
>> this issue?
>>
>> Thanks
>> Brice
>>

^ permalink raw reply

* Re: [PATCH v2 net-next 2/2] openvswitch: Fix skb->protocol for vlan frames.
From: Pravin Shelar @ 2016-11-29  7:21 UTC (permalink / raw)
  To: Jarno Rajahalme; +Cc: Linux Kernel Network Developers, Jiri Benc
In-Reply-To: <1480387276-123557-2-git-send-email-jarno@ovn.org>

On Mon, Nov 28, 2016 at 6:41 PM, Jarno Rajahalme <jarno@ovn.org> wrote:
> Do not set skb->protocol to be the ethertype of the L3 header, unless
> the packet only has the L3 header.  For a non-hardware offloaded VLAN
> frame skb->protocol needs to be one of the VLAN ethertypes.
>
> Any VLAN offloading is undone on the OVS netlink interface.  Also any
> VLAN tags added by userspace are non-offloaded.
>
> Incorrect skb->protocol value on a full-size non-offloaded VLAN skb
> causes packet drop due to failing MTU check, as the VLAN header should
> not be counted in when considering MTU in ovs_vport_send().
>
I think we should move to is_skb_forwardable() type of packet length
check in vport-send and get rid of skb-protocol checks altogether.

> Fixes: 5108bbaddc ("openvswitch: add processing of L3 packets")
> Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
> ---
> v2: Set skb->protocol when an ETH_P_TEB frame is received via ARPHRD_NONE
>     interface.
>
>  net/openvswitch/datapath.c |  1 -
>  net/openvswitch/flow.c     | 30 ++++++++++++++++++++++--------
>  2 files changed, 22 insertions(+), 9 deletions(-)
...
...
> @@ -531,15 +538,22 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
>                 if (unlikely(parse_vlan(skb, key)))
>                         return -ENOMEM;
>
> -               skb->protocol = parse_ethertype(skb);
> -               if (unlikely(skb->protocol == htons(0)))
> +               key->eth.type = parse_ethertype(skb);
> +               if (unlikely(key->eth.type == htons(0)))
>                         return -ENOMEM;
>
> +               if (skb->protocol == htons(ETH_P_TEB)) {
> +                       if (key->eth.vlan.tci & htons(VLAN_TAG_PRESENT)
> +                           && !skb_vlan_tag_present(skb))
> +                               skb->protocol = key->eth.vlan.tpid;
> +                       else
> +                               skb->protocol = key->eth.type;
> +               }
> +

I am not sure if this work in case of nested vlans.
Can we move skb-protocol assignment to parse_vlan() to avoid checking
for non-accelerated vlan case again here?

^ permalink raw reply

* Re: [PATCH net V2] net/sched: pedit: make sure that offset is valid
From: Amir Vadai @ 2016-11-29  7:14 UTC (permalink / raw)
  To: zhuyj
  Cc: David S. Miller, netdev, Cong Wang, Jamal Hadi Salim, Or Gerlitz,
	Hadar Har-Zion, Jiri Pirko
In-Reply-To: <CAD=hENcSiWiH-9e6=gjn+wK4R6ZQsNa21R_w_eWzDtCxiUDNVQ@mail.gmail.com>

On Tue, Nov 29, 2016 at 10:32:05AM +0800, zhuyj wrote:
>  +       if (offset > 0 && offset > skb->len)
> 
> offset > skb->len is enough?
offset is signed and skb->len is unsigned. Therefore for example if
offset=-1 and skb->len=10, the actual comparison is 0xff...>10

> 
> On Mon, Nov 28, 2016 at 6:56 PM, Amir Vadai <amir@vadai.me> wrote:
> > Add a validation function to make sure offset is valid:
> > 1. Not below skb head (could happen when offset is negative).
> > 2. Validate both 'offset' and 'at'.
> >
> > Signed-off-by: Amir Vadai <amir@vadai.me>
> > ---
> > Hi Dave,
> >
> > Please pull to -stable branches.
> >
> > Changes from V0:
> > - Add a validation to the 'at' value (this is used as an offset too)
> > - Instead of validating the output of skb_header_pointer(), make sure that the
> >         offset is good before calling it.
> >
> > Thanks,
> > Amir
> >  net/sched/act_pedit.c | 24 ++++++++++++++++++++----
> >  1 file changed, 20 insertions(+), 4 deletions(-)
> >
> > diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> > index b54d56d4959b..cf9b2fe8eac6 100644
> > --- a/net/sched/act_pedit.c
> > +++ b/net/sched/act_pedit.c
> > @@ -108,6 +108,17 @@ static void tcf_pedit_cleanup(struct tc_action *a, int bind)
> >         kfree(keys);
> >  }
> >
> > +static bool offset_valid(struct sk_buff *skb, int offset)
> > +{
> > +       if (offset > 0 && offset > skb->len)
> > +               return false;
> > +
> > +       if  (offset < 0 && -offset > skb_headroom(skb))
> > +               return false;
> > +
> > +       return true;
> > +}
> > +
> >  static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
> >                      struct tcf_result *res)
> >  {
> > @@ -134,6 +145,11 @@ static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
> >                         if (tkey->offmask) {
> >                                 char *d, _d;
> >
> > +                               if (!offset_valid(skb, off + tkey->at)) {
> > +                                       pr_info("tc filter pedit 'at' offset %d out of bounds\n",
> > +                                               off + tkey->at);
> > +                                       goto bad;
> > +                               }
> >                                 d = skb_header_pointer(skb, off + tkey->at, 1,
> >                                                        &_d);
> >                                 if (!d)
> > @@ -146,10 +162,10 @@ static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
> >                                         " offset must be on 32 bit boundaries\n");
> >                                 goto bad;
> >                         }
> > -                       if (offset > 0 && offset > skb->len) {
> > -                               pr_info("tc filter pedit"
> > -                                       " offset %d can't exceed pkt length %d\n",
> > -                                      offset, skb->len);
> > +
> > +                       if (!offset_valid(skb, off + offset)) {
> > +                               pr_info("tc filter pedit offset %d out of bounds\n",
> > +                                       offset);
> >                                 goto bad;
> >                         }
> >
> > --
> > 2.10.2
> >

^ permalink raw reply

* Re: bnx2 breaks Dell R815 BMC IPMI since 4.8
From: Baoquan He @ 2016-11-29  7:02 UTC (permalink / raw)
  To: Brice Goglin; +Cc: Linux Network Development list
In-Reply-To: <583D26EF.60207@inria.fr>

Sorry, Brice. This has been reported by people, and it has been fixed by
later post. The commits within linus's tree are:

commit 6df77862f63f389df3b1ad879738e04440d7385d
Author: Baoquan He <bhe@redhat.com>
Date:   Sun Nov 13 13:01:33 2016 +0800

    bnx2: Wait for in-flight DMA to complete at probe stage

commit 5d0d4b91bf627f14f95167b738d524156c9d440b
Author: Baoquan He <bhe@redhat.com>
Date:   Sun Nov 13 13:01:32 2016 +0800

    Revert "bnx2: Reset device during driver initialization"
    
    This reverts commit 3e1be7ad2d38c6bd6aeef96df9bd0a7822f4e51c.

And I believe both of them also are picked up into 4.8-stable kernel.
Please have a way to get them.

Sorry again!

Thanks
Baoquan


On 11/29/16 at 07:57am, Brice Goglin wrote:
> Hello
> 
> My Dell PowerEdge R815 doesn't have IPMI anymore when I boot a 4.8
> kernel, the BMC doesn't even ping anymore. Its Ethernet devices are 4 of
> those:
> 
> 01:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
> 	DeviceName: Embedded NIC 1                          
> 	Subsystem: Dell NetXtreme II BCM5709 Gigabit Ethernet
> 	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> 	Latency: 0, Cache Line Size: 64 bytes
> 	Interrupt: pin A routed to IRQ 42
> 	Region 0: Memory at e6000000 (64-bit, non-prefetchable) [size=32M]
> 	Capabilities: <access denied>
> 	Kernel driver in use: bnx2
> 	Kernel modules: bnx2
> 
> The only change in bnx2 between 4.7 and 4.8 appears to be this one:
> 
> commit 3e1be7ad2d38c6bd6aeef96df9bd0a7822f4e51c
> Author: Baoquan He <bhe@redhat.com>
> Date:   Fri Sep 9 22:43:12 2016 +0800
> 
>     bnx2: Reset device during driver initialization
> 
> Could you patch actually break the BMC? What do I need to further debug
> this issue?
> 
> Thanks
> Brice
> 

^ permalink raw reply

* Re: [Patch net-next] net_sched: move the empty tp check from ->destroy() to ->delete()
From: Cong Wang @ 2016-11-29  6:59 UTC (permalink / raw)
  To: Roi Dayan
  Cc: Daniel Borkmann, Linux Kernel Network Developers, Jiri Pirko,
	John Fastabend
In-Reply-To: <583A7D67.50003@mellanox.com>

On Sat, Nov 26, 2016 at 10:29 PM, Roi Dayan <roid@mellanox.com> wrote:
> Hi,
>
> I tested "[PATCH net] net, sched: respect rcu grace period on cls
> destruction" and could not reproduce my original issue.
> I rebased "[Patch net-next] net_sched: move the empty tp check from
> ->destroy() to ->delete()" over to test it in the same tree and got into a
> new trace in fl_delete.

I will take care of this when I rebase my patch.

Thanks for testing anyway.

^ permalink raw reply

* [PATCH v2] ethernet :mellanox :mlx4: Replace pci_pool_alloc by pci_pool_zalloc
From: Souptick Joarder @ 2016-11-29  6:59 UTC (permalink / raw)
  To: sergei.shtylyov-M4DtvfQ/ZS1MRgGoP+s0PdBPR1lH4CV8,
	yishaih-VPRAkNaXOzVWk0Htik3J/w
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	sahu.rameshwar73-Re5JQEeQqe8AvxtiuMwx3w

In mlx4_alloc_cmd_mailbox(), pci_pool_alloc() followed by memset will be
replaced by pci_pool_zalloc().

Signed-off-by: Souptick joarder <jrdr.linux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
v2:
  - Address comment from sergei
    Alignment was not proper

 drivers/net/ethernet/mellanox/mlx4/cmd.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c
index e36bebc..96cdf9a 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
@@ -2679,14 +2679,13 @@ struct mlx4_cmd_mailbox *mlx4_alloc_cmd_mailbox(struct mlx4_dev *dev)
 	if (!mailbox)
 		return ERR_PTR(-ENOMEM);
 
-	mailbox->buf = pci_pool_alloc(mlx4_priv(dev)->cmd.pool, GFP_KERNEL,
-				      &mailbox->dma);
+	mailbox->buf = pci_pool_zalloc(mlx4_priv(dev)->cmd.pool, GFP_KERNEL,
+				       &mailbox->dma);
 	if (!mailbox->buf) {
 		kfree(mailbox);
 		return ERR_PTR(-ENOMEM);
 	}
 
-	memset(mailbox->buf, 0, MLX4_MAILBOX_SIZE);
 
 	return mailbox;
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [WIP] net+mlx4: auto doorbell
From: Eric Dumazet @ 2016-11-29  6:58 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: Rick Jones, netdev, Saeed Mahameed, Tariq Toukan
In-Reply-To: <1479751857.8455.419.camel@edumazet-glaptop3.roam.corp.google.com>

On Mon, 2016-11-21 at 10:10 -0800, Eric Dumazet wrote:


> Not sure it this has been tried before, but the doorbell avoidance could
> be done by the driver itself, because it knows a TX completion will come
> shortly (well... if softirqs are not delayed too much !)
> 
> Doorbell would be forced only if :
> 
> (    "skb->xmit_more is not set" AND "TX engine is not 'started yet'" )
> OR
> ( too many [1] packets were put in TX ring buffer, no point deferring
> more)
> 
> Start the pump, but once it is started, let the doorbells being done by
> TX completion.
> 
> ndo_start_xmit and TX completion handler would have to maintain a shared
> state describing if packets were ready but doorbell deferred.
> 
> 
> Note that TX completion means "if at least one packet was drained",
> otherwise busy polling, constantly calling napi->poll() would force a
> doorbell too soon for devices sharing a NAPI for both RX and TX.
> 
> But then, maybe busy poll would like to force a doorbell...
> 
> I could try these ideas on mlx4 shortly.
> 
> 
> [1] limit could be derived from active "ethtool -c" params, eg tx-frames

I have a WIP, that increases pktgen rate by 75 % on mlx4 when bulking is
not used.

lpaa23:~# echo 0 >/sys/class/net/eth0/doorbell_opt 
lpaa23:~# sar -n DEV 1 10|grep eth0
22:43:26         eth0      0.00 5822800.00      0.00 597064.41      0.00      0.00      1.00
22:43:27         eth0     24.00 5788237.00      2.09 593520.26      0.00      0.00      0.00
22:43:28         eth0     12.00 5817777.00      1.43 596551.47      0.00      0.00      1.00
22:43:29         eth0     22.00 5841516.00      1.61 598982.87      0.00      0.00      0.00
22:43:30         eth0      4.00 4389137.00      0.71 450058.08      0.00      0.00      1.00
22:43:31         eth0      4.00 5871008.00      0.72 602007.79      0.00      0.00      0.00
22:43:32         eth0     12.00 5891809.00      1.43 604142.60      0.00      0.00      1.00
22:43:33         eth0     10.00 5901904.00      1.12 605175.70      0.00      0.00      0.00
22:43:34         eth0      5.00 5907982.00      0.69 605798.99      0.00      0.00      1.00
22:43:35         eth0      2.00 5847086.00      0.12 599554.71      0.00      0.00      0.00
Average:         eth0      9.50 5707925.60      0.99 585285.69      0.00      0.00      0.50
lpaa23:~# echo 1 >/sys/class/net/eth0/doorbell_opt 
lpaa23:~# sar -n DEV 1 10|grep eth0
22:43:47         eth0      9.00 10226424.00      1.02 1048608.05      0.00      0.00      1.00
22:43:48         eth0      1.00 10316955.00      0.06 1057890.89      0.00      0.00      0.00
22:43:49         eth0      1.00 10310104.00      0.10 1057188.32      0.00      0.00      1.00
22:43:50         eth0      0.00 10249423.00      0.00 1050966.23      0.00      0.00      0.00
22:43:51         eth0      0.00 10210441.00      0.00 1046969.05      0.00      0.00      1.00
22:43:52         eth0      2.00 10198389.00      0.16 1045733.17      0.00      0.00      1.00
22:43:53         eth0      8.00 10079257.00      1.43 1033517.83      0.00      0.00      0.00
22:43:54         eth0      0.00 7693509.00      0.00 788885.16      0.00      0.00      0.00
22:43:55         eth0      2.00 10343076.00      0.20 1060569.32      0.00      0.00      1.00
22:43:56         eth0      1.00 10224571.00      0.14 1048417.93      0.00      0.00      0.00
Average:         eth0      2.40 9985214.90      0.31 1023874.60      0.00      0.00      0.50

And about 11 % improvement on an mono-flow UDP_STREAM test.

skb_set_owner_w() is now the most consuming function.


lpaa23:~# ./udpsnd -4 -H 10.246.7.152 -d 2 &
[1] 13696
lpaa23:~# echo 0 >/sys/class/net/eth0/doorbell_opt
lpaa23:~# sar -n DEV 1 10|grep eth0
22:50:47         eth0      3.00 1355422.00      0.45 319706.04      0.00      0.00      0.00
22:50:48         eth0      2.00 1344270.00      0.42 317035.21      0.00      0.00      1.00
22:50:49         eth0      3.00 1350503.00      0.51 318478.34      0.00      0.00      0.00
22:50:50         eth0     29.00 1348593.00      2.86 318113.02      0.00      0.00      1.00
22:50:51         eth0     14.00 1354855.00      1.83 319508.56      0.00      0.00      0.00
22:50:52         eth0      7.00 1357794.00      0.73 320226.89      0.00      0.00      1.00
22:50:53         eth0      5.00 1326130.00      0.63 312784.72      0.00      0.00      0.00
22:50:54         eth0      2.00 994584.00      0.12 234598.40      0.00      0.00      1.00
22:50:55         eth0      5.00 1318209.00      0.75 310932.46      0.00      0.00      0.00
22:50:56         eth0     20.00 1323445.00      1.73 312178.11      0.00      0.00      1.00
Average:         eth0      9.00 1307380.50      1.00 308356.18      0.00      0.00      0.50
lpaa23:~# echo 3 >/sys/class/net/eth0/doorbell_opt
lpaa23:~# sar -n DEV 1 10|grep eth0
22:51:03         eth0      4.00 1512055.00      0.54 356599.40      0.00      0.00      0.00
22:51:04         eth0      4.00 1507631.00      0.55 355609.46      0.00      0.00      1.00
22:51:05         eth0      4.00 1487789.00      0.42 350917.47      0.00      0.00      0.00
22:51:06         eth0      7.00 1474460.00      1.22 347811.16      0.00      0.00      1.00
22:51:07         eth0      2.00 1496529.00      0.24 352995.18      0.00      0.00      0.00
22:51:08         eth0      3.00 1485856.00      0.49 350425.65      0.00      0.00      1.00
22:51:09         eth0      1.00 1114808.00      0.06 262905.38      0.00      0.00      0.00
22:51:10         eth0      2.00 1510924.00      0.30 356397.53      0.00      0.00      1.00
22:51:11         eth0      2.00 1506408.00      0.30 355345.76      0.00      0.00      0.00
22:51:12         eth0      2.00 1499122.00      0.32 353668.75      0.00      0.00      1.00
Average:         eth0      3.10 1459558.20      0.44 344267.57      0.00      0.00      0.50

 drivers/net/ethernet/mellanox/mlx4/en_rx.c   |    2 
 drivers/net/ethernet/mellanox/mlx4/en_tx.c   |   90 +++++++++++------
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |    4 
 include/linux/netdevice.h                    |    1 
 net/core/net-sysfs.c                         |   18 +++
 5 files changed, 83 insertions(+), 32 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 6562f78b07f4..fbea83218fc0 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -1089,7 +1089,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 
 	if (polled) {
 		if (doorbell_pending)
-			mlx4_en_xmit_doorbell(priv->tx_ring[TX_XDP][cq->ring]);
+			mlx4_en_xmit_doorbell(dev, priv->tx_ring[TX_XDP][cq->ring]);
 
 		mlx4_cq_set_ci(&cq->mcq);
 		wmb(); /* ensure HW sees CQ consumer before we post new buffers */
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 4b597dca5c52..affebb435679 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -67,7 +67,7 @@ int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
 	ring->size = size;
 	ring->size_mask = size - 1;
 	ring->sp_stride = stride;
-	ring->full_size = ring->size - HEADROOM - MAX_DESC_TXBBS;
+	ring->full_size = ring->size - HEADROOM - 2*MAX_DESC_TXBBS;
 
 	tmp = size * sizeof(struct mlx4_en_tx_info);
 	ring->tx_info = kmalloc_node(tmp, GFP_KERNEL | __GFP_NOWARN, node);
@@ -193,6 +193,7 @@ int mlx4_en_activate_tx_ring(struct mlx4_en_priv *priv,
 	ring->sp_cqn = cq;
 	ring->prod = 0;
 	ring->cons = 0xffffffff;
+	ring->ncons = 0;
 	ring->last_nr_txbb = 1;
 	memset(ring->tx_info, 0, ring->size * sizeof(struct mlx4_en_tx_info));
 	memset(ring->buf, 0, ring->buf_size);
@@ -227,9 +228,9 @@ void mlx4_en_deactivate_tx_ring(struct mlx4_en_priv *priv,
 		       MLX4_QP_STATE_RST, NULL, 0, 0, &ring->sp_qp);
 }
 
-static inline bool mlx4_en_is_tx_ring_full(struct mlx4_en_tx_ring *ring)
+static inline bool mlx4_en_is_tx_ring_full(const struct mlx4_en_tx_ring *ring)
 {
-	return ring->prod - ring->cons > ring->full_size;
+	return READ_ONCE(ring->prod) - READ_ONCE(ring->cons) > ring->full_size;
 }
 
 static void mlx4_en_stamp_wqe(struct mlx4_en_priv *priv,
@@ -374,6 +375,7 @@ int mlx4_en_free_tx_buf(struct net_device *dev, struct mlx4_en_tx_ring *ring)
 
 	/* Skip last polled descriptor */
 	ring->cons += ring->last_nr_txbb;
+	ring->ncons += ring->last_nr_txbb;
 	en_dbg(DRV, priv, "Freeing Tx buf - cons:0x%x prod:0x%x\n",
 		 ring->cons, ring->prod);
 
@@ -389,6 +391,7 @@ int mlx4_en_free_tx_buf(struct net_device *dev, struct mlx4_en_tx_ring *ring)
 						!!(ring->cons & ring->size), 0,
 						0 /* Non-NAPI caller */);
 		ring->cons += ring->last_nr_txbb;
+		ring->ncons += ring->last_nr_txbb;
 		cnt++;
 	}
 
@@ -401,6 +404,38 @@ int mlx4_en_free_tx_buf(struct net_device *dev, struct mlx4_en_tx_ring *ring)
 	return cnt;
 }
 
+void mlx4_en_xmit_doorbell(const struct net_device *dev,
+			   struct mlx4_en_tx_ring *ring)
+{
+
+	if (dev->doorbell_opt & 1) {
+		u32 oval = READ_ONCE(ring->prod_bell);
+		u32 nval = READ_ONCE(ring->prod);
+
+		if (oval == nval)
+			return;
+
+		/* I can not tell yet if a cmpxchg() is needed or not */
+		if (dev->doorbell_opt & 2)
+			WRITE_ONCE(ring->prod_bell, nval);
+		else
+			if (cmpxchg(&ring->prod_bell, oval, nval) != oval)
+				return;
+	}
+	/* Since there is no iowrite*_native() that writes the
+	 * value as is, without byteswapping - using the one
+	 * the doesn't do byteswapping in the relevant arch
+	 * endianness.
+	 */
+#if defined(__LITTLE_ENDIAN)
+	iowrite32(
+#else
+	iowrite32be(
+#endif
+		  ring->doorbell_qpn,
+		  ring->bf.uar->map + MLX4_SEND_DOORBELL);
+}
+
 static bool mlx4_en_process_tx_cq(struct net_device *dev,
 				  struct mlx4_en_cq *cq, int napi_budget)
 {
@@ -496,8 +531,13 @@ static bool mlx4_en_process_tx_cq(struct net_device *dev,
 	wmb();
 
 	/* we want to dirty this cache line once */
-	ACCESS_ONCE(ring->last_nr_txbb) = last_nr_txbb;
-	ACCESS_ONCE(ring->cons) = ring_cons + txbbs_skipped;
+	WRITE_ONCE(ring->last_nr_txbb, last_nr_txbb);
+	ring_cons += txbbs_skipped;
+	WRITE_ONCE(ring->cons, ring_cons);
+	WRITE_ONCE(ring->ncons, ring_cons + last_nr_txbb);
+
+	if (dev->doorbell_opt)
+		mlx4_en_xmit_doorbell(dev, ring);
 
 	if (ring->free_tx_desc == mlx4_en_recycle_tx_desc)
 		return done < budget;
@@ -725,29 +765,14 @@ static void mlx4_bf_copy(void __iomem *dst, const void *src,
 	__iowrite64_copy(dst, src, bytecnt / 8);
 }
 
-void mlx4_en_xmit_doorbell(struct mlx4_en_tx_ring *ring)
-{
-	wmb();
-	/* Since there is no iowrite*_native() that writes the
-	 * value as is, without byteswapping - using the one
-	 * the doesn't do byteswapping in the relevant arch
-	 * endianness.
-	 */
-#if defined(__LITTLE_ENDIAN)
-	iowrite32(
-#else
-	iowrite32be(
-#endif
-		  ring->doorbell_qpn,
-		  ring->bf.uar->map + MLX4_SEND_DOORBELL);
-}
 
 static void mlx4_en_tx_write_desc(struct mlx4_en_tx_ring *ring,
 				  struct mlx4_en_tx_desc *tx_desc,
 				  union mlx4_wqe_qpn_vlan qpn_vlan,
 				  int desc_size, int bf_index,
 				  __be32 op_own, bool bf_ok,
-				  bool send_doorbell)
+				  bool send_doorbell,
+				  const struct net_device *dev, int nr_txbb)
 {
 	tx_desc->ctrl.qpn_vlan = qpn_vlan;
 
@@ -761,6 +786,7 @@ static void mlx4_en_tx_write_desc(struct mlx4_en_tx_ring *ring,
 
 		wmb();
 
+		ring->prod += nr_txbb;
 		mlx4_bf_copy(ring->bf.reg + ring->bf.offset, &tx_desc->ctrl,
 			     desc_size);
 
@@ -773,8 +799,9 @@ static void mlx4_en_tx_write_desc(struct mlx4_en_tx_ring *ring,
 		 */
 		dma_wmb();
 		tx_desc->ctrl.owner_opcode = op_own;
+		ring->prod += nr_txbb;
 		if (send_doorbell)
-			mlx4_en_xmit_doorbell(ring);
+			mlx4_en_xmit_doorbell(dev, ring);
 		else
 			ring->xmit_more++;
 	}
@@ -1017,8 +1044,6 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 			op_own |= cpu_to_be32(MLX4_WQE_CTRL_IIP);
 	}
 
-	ring->prod += nr_txbb;
-
 	/* If we used a bounce buffer then copy descriptor back into place */
 	if (unlikely(bounce))
 		tx_desc = mlx4_en_bounce_to_desc(priv, ring, index, desc_size);
@@ -1033,6 +1058,14 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 	}
 	send_doorbell = !skb->xmit_more || netif_xmit_stopped(ring->tx_queue);
 
+	/* Doorbell avoidance : We can omit doorbell if we know a TX completion
+	 * will happen shortly.
+	 */
+	if (send_doorbell &&
+	    dev->doorbell_opt &&
+	    (s32)(READ_ONCE(ring->prod_bell) - READ_ONCE(ring->ncons)) > 0)
+		send_doorbell = false;
+
 	real_size = (real_size / 16) & 0x3f;
 
 	bf_ok &= desc_size <= MAX_BF && send_doorbell;
@@ -1043,7 +1076,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 		qpn_vlan.fence_size = real_size;
 
 	mlx4_en_tx_write_desc(ring, tx_desc, qpn_vlan, desc_size, bf_index,
-			      op_own, bf_ok, send_doorbell);
+			      op_own, bf_ok, send_doorbell, dev, nr_txbb);
 
 	if (unlikely(stop_queue)) {
 		/* If queue was emptied after the if (stop_queue) , and before
@@ -1054,7 +1087,6 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 		 */
 		smp_rmb();
 
-		ring_cons = ACCESS_ONCE(ring->cons);
 		if (unlikely(!mlx4_en_is_tx_ring_full(ring))) {
 			netif_tx_wake_queue(ring->tx_queue);
 			ring->wake_queue++;
@@ -1158,8 +1190,6 @@ netdev_tx_t mlx4_en_xmit_frame(struct mlx4_en_rx_ring *rx_ring,
 	rx_ring->xdp_tx++;
 	AVG_PERF_COUNTER(priv->pstats.tx_pktsz_avg, length);
 
-	ring->prod += nr_txbb;
-
 	stop_queue = mlx4_en_is_tx_ring_full(ring);
 	send_doorbell = stop_queue ||
 				*doorbell_pending > MLX4_EN_DOORBELL_BUDGET;
@@ -1173,7 +1203,7 @@ netdev_tx_t mlx4_en_xmit_frame(struct mlx4_en_rx_ring *rx_ring,
 		qpn_vlan.fence_size = real_size;
 
 	mlx4_en_tx_write_desc(ring, tx_desc, qpn_vlan, TXBB_SIZE, bf_index,
-			      op_own, bf_ok, send_doorbell);
+			      op_own, bf_ok, send_doorbell, dev, nr_txbb);
 	*doorbell_pending = send_doorbell ? 0 : *doorbell_pending + 1;
 
 	return NETDEV_TX_OK;
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 574bcbb1b38f..c3fd0deda198 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -280,6 +280,7 @@ struct mlx4_en_tx_ring {
 	 */
 	u32			last_nr_txbb;
 	u32			cons;
+	u32			ncons;
 	unsigned long		wake_queue;
 	struct netdev_queue	*tx_queue;
 	u32			(*free_tx_desc)(struct mlx4_en_priv *priv,
@@ -290,6 +291,7 @@ struct mlx4_en_tx_ring {
 
 	/* cache line used and dirtied in mlx4_en_xmit() */
 	u32			prod ____cacheline_aligned_in_smp;
+	u32			prod_bell;
 	unsigned int		tx_dropped;
 	unsigned long		bytes;
 	unsigned long		packets;
@@ -699,7 +701,7 @@ netdev_tx_t mlx4_en_xmit_frame(struct mlx4_en_rx_ring *rx_ring,
 			       struct mlx4_en_rx_alloc *frame,
 			       struct net_device *dev, unsigned int length,
 			       int tx_ind, int *doorbell_pending);
-void mlx4_en_xmit_doorbell(struct mlx4_en_tx_ring *ring);
+void mlx4_en_xmit_doorbell(const struct net_device *dev, struct mlx4_en_tx_ring *ring);
 bool mlx4_en_rx_recycle(struct mlx4_en_rx_ring *ring,
 			struct mlx4_en_rx_alloc *frame);
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 4ffcd874cc20..39565b5425a6 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1816,6 +1816,7 @@ struct net_device {
 	DECLARE_HASHTABLE	(qdisc_hash, 4);
 #endif
 	unsigned long		tx_queue_len;
+	unsigned long		doorbell_opt;
 	spinlock_t		tx_global_lock;
 	int			watchdog_timeo;
 
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index b0c04cf4851d..df05f81f5150 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -367,6 +367,23 @@ static ssize_t gro_flush_timeout_store(struct device *dev,
 }
 NETDEVICE_SHOW_RW(gro_flush_timeout, fmt_ulong);
 
+static int change_doorbell_opt(struct net_device *dev, unsigned long val)
+{
+	dev->doorbell_opt = val;
+	return 0;
+}
+
+static ssize_t doorbell_opt_store(struct device *dev,
+				  struct device_attribute *attr,
+				  const char *buf, size_t len)
+{
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
+	return netdev_store(dev, attr, buf, len, change_doorbell_opt);
+}
+NETDEVICE_SHOW_RW(doorbell_opt, fmt_ulong);
+
 static ssize_t ifalias_store(struct device *dev, struct device_attribute *attr,
 			     const char *buf, size_t len)
 {
@@ -531,6 +548,7 @@ static struct attribute *net_class_attrs[] = {
 	&dev_attr_phys_port_name.attr,
 	&dev_attr_phys_switch_id.attr,
 	&dev_attr_proto_down.attr,
+	&dev_attr_doorbell_opt.attr,
 	NULL,
 };
 ATTRIBUTE_GROUPS(net_class);

^ permalink raw reply related

* Re: [Patch net-next] net_sched: move the empty tp check from ->destroy() to ->delete()
From: Cong Wang @ 2016-11-29  6:57 UTC (permalink / raw)
  To: John Fastabend
  Cc: Linux Kernel Network Developers, Roi Dayan, Jiri Pirko,
	Daniel Borkmann
In-Reply-To: <583B9D22.8090906@gmail.com>

On Sun, Nov 27, 2016 at 6:57 PM, John Fastabend
<john.fastabend@gmail.com> wrote:
> Hi Cong,
>
> Thanks a lot for doing this. Can you rebase it on top of Daniel's patch
> though,
>
>  [PATCH net] net, sched: respect rcu grace period on cls destruction
>
> And then push the NULL pointer work for the cls_fw and cls_route
> classifiers into another patch.
>
> Then I believe the last thing to make this correct is to convert the
> call_rcu() paths to call_rcu_bh().

Sure, will rebase my patch once DaveM merges net into net-next.

Thanks.

^ permalink raw reply

* bnx2 breaks Dell R815 BMC IPMI since 4.8
From: Brice Goglin @ 2016-11-29  6:57 UTC (permalink / raw)
  To: Linux Network Development list, Baoquan He

Hello

My Dell PowerEdge R815 doesn't have IPMI anymore when I boot a 4.8
kernel, the BMC doesn't even ping anymore. Its Ethernet devices are 4 of
those:

01:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
	DeviceName: Embedded NIC 1                          
	Subsystem: Dell NetXtreme II BCM5709 Gigabit Ethernet
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 42
	Region 0: Memory at e6000000 (64-bit, non-prefetchable) [size=32M]
	Capabilities: <access denied>
	Kernel driver in use: bnx2
	Kernel modules: bnx2

The only change in bnx2 between 4.7 and 4.8 appears to be this one:

commit 3e1be7ad2d38c6bd6aeef96df9bd0a7822f4e51c
Author: Baoquan He <bhe@redhat.com>
Date:   Fri Sep 9 22:43:12 2016 +0800

    bnx2: Reset device during driver initialization

Could you patch actually break the BMC? What do I need to further debug
this issue?

Thanks
Brice

^ permalink raw reply

* Re: [PATCH net] net, sched: respect rcu grace period on cls destruction
From: Cong Wang @ 2016-11-29  6:55 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Daniel Borkmann, David Miller, John Fastabend, Roi Dayan, ast,
	Hannes Frederic Sowa, Jiri Pirko, Linux Kernel Network Developers
In-Reply-To: <20161128104736.GX31360@linux.vnet.ibm.com>

On Mon, Nov 28, 2016 at 2:47 AM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> RCU callbacks are always executed in softirq context, so yes, you do need
> to use something like a work struct.  (Or a wakeup to a kthread or
> whatever.)

Thanks for your information.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox