Netdev List
 help / color / mirror / Atom feed
* Re: WARNING: refcount bug in should_fail
From: Eric W. Biederman @ 2018-04-02 20:30 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: syzbot+, syzkaller-bugs, dvyukov, linux-fsdevel, linux-mm, netdev,
	viro
In-Reply-To: <201804011941.IAE69652.OHMVJLFtSOFFQO@I-love.SAKURA.ne.jp>

Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> writes:

> syzbot wrote:
>> > On Sun, Mar 4, 2018 at 6:57 AM, Tetsuo Handa
>> > <penguin-kernel@i-love.sakura.ne.jp> wrote:
>> >> Switching from mm to fsdevel, for this report says that put_net(net) in
>> >> rpc_kill_sb() made net->count < 0 when mount_ns() failed due to
>> >> register_shrinker() failure.
>> 
>> >> Relevant commits will be
>> >> commit 9ee332d99e4d5a97 ("sget(): handle failures of  
>> >> register_shrinker()") and
>> >> commit d91ee87d8d85a080 ("vfs: Pass data, ns, and ns->userns to  
>> >> mount_ns.").
>> 
>> >> When sget_userns() in mount_ns() failed, mount_ns() returns an error  
>> >> code to
>> >> the caller without calling fill_super(). That is, get_net(sb->s_fs_info)  
>> >> was
>> >> not called by rpc_fill_super() (via fill_super callback passed to  
>> >> mount_ns())
>> >> but put_net(sb->s_fs_info) is called by rpc_kill_sb() (via fs->kill_sb()  
>> >> from
>> >> deactivate_locked_super()).
>> 
>> >> ----------
>> >> static struct dentry *
>> >> rpc_mount(struct file_system_type *fs_type,
>> >>                  int flags, const char *dev_name, void *data)
>> >> {
>> >>          struct net *net = current->nsproxy->net_ns;
>> >>          return mount_ns(fs_type, flags, data, net, net->user_ns,  
>> >> rpc_fill_super);
>> >> }
>> >> ----------
>> 
>> > Messed kernel output, this is definitely not in should_fail.
>> 
>> > #syz dup: WARNING: refcount bug in sk_alloc
>> 
>> Can't find the corresponding bug.
>> 
> I don't think this is a dup of existing bug.
> We need to fix either 9ee332d99e4d5a97 or d91ee87d8d85a080.

Even if expanding mount_ns to more filesystems was magically fixed,
proc would still have this issue with the pid namespace rather than
the net namespace.

This is a mess.  I will take a look and see if I can see a a fix.

Eric

^ permalink raw reply

* RE: [PATCH v5 12/14] fm10k: Report PCIe link properties with pcie_print_link_status()
From: Keller, Jacob E @ 2018-04-02 20:36 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Tal Gilboa, Tariq Toukan, Ariel Elior, Ganesh Goudar,
	Kirsher, Jeffrey T, everest-linux-l2@cavium.com,
	intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org
In-Reply-To: <20180402203144.GM9322@bhelgaas-glaptop.roam.corp.google.com>

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> On Behalf Of Bjorn Helgaas
> Sent: Monday, April 02, 2018 1:32 PM
> To: Keller, Jacob E <jacob.e.keller@intel.com>
> Cc: Tal Gilboa <talgi@mellanox.com>; Tariq Toukan <tariqt@mellanox.com>; Ariel
> Elior <ariel.elior@cavium.com>; Ganesh Goudar <ganeshgr@chelsio.com>;
> Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>; everest-linux-l2@cavium.com;
> intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org; linux-pci@vger.kernel.org
> Subject: Re: [PATCH v5 12/14] fm10k: Report PCIe link properties with
> pcie_print_link_status()
> 
> On Mon, Apr 02, 2018 at 03:56:06PM +0000, Keller, Jacob E wrote:
> > > -----Original Message-----
> > > From: Bjorn Helgaas [mailto:helgaas@kernel.org]
> > > Sent: Friday, March 30, 2018 2:06 PM
> > > To: Tal Gilboa <talgi@mellanox.com>
> > > Cc: Tariq Toukan <tariqt@mellanox.com>; Keller, Jacob E
> > > <jacob.e.keller@intel.com>; Ariel Elior <ariel.elior@cavium.com>; Ganesh
> > > Goudar <ganeshgr@chelsio.com>; Kirsher, Jeffrey T
> > > <jeffrey.t.kirsher@intel.com>; everest-linux-l2@cavium.com; intel-wired-
> > > lan@lists.osuosl.org; netdev@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > linux-pci@vger.kernel.org
> > > Subject: [PATCH v5 12/14] fm10k: Report PCIe link properties with
> > > pcie_print_link_status()
> > >
> > > From: Bjorn Helgaas <bhelgaas@google.com>
> > >
> > > Use pcie_print_link_status() to report PCIe link speed and possible
> > > limitations instead of implementing this in the driver itself.
> > >
> > > Note that pcie_get_minimum_link() can return misleading information
> because
> > > it finds the slowest link and the narrowest link without considering the
> > > total bandwidth of the link.  If the path contains a 16 GT/s x1 link and a
> > > 2.5 GT/s x16 link, pcie_get_minimum_link() returns 2.5 GT/s x1, which
> > > corresponds to 250 MB/s of bandwidth, not the actual available bandwidth of
> > > about 2000 MB/s for a 16 GT/s x1 link.
> >
> > This comment is about what's being fixed, so it would have been easier to
> > parse if it were written to more clearly indicate that we're removing
> > (and not adding) this behavior.
> 
> Good point.  Is this any better?
> 
>   fm10k: Report PCIe link properties with pcie_print_link_status()
> 
>   Previously the driver used pcie_get_minimum_link() to warn when the NIC
>   is in a slot that can't supply as much bandwidth as the NIC could use.
> 
>   pcie_get_minimum_link() can be misleading because it finds the slowest link
>   and the narrowest link (which may be different links) without considering
>   the total bandwidth of each link.  For a path with a 16 GT/s x1 link and a
>   2.5 GT/s x16 link, it returns 2.5 GT/s x1, which corresponds to 250 MB/s of
>   bandwidth, not the true available bandwidth of about 1969 MB/s for a
>   16 GT/s x1 link.
> 
>   Use pcie_print_link_status() to report PCIe link speed and possible
>   limitations instead of implementing this in the driver itself.  This finds
>   the slowest link in the path to the device by computing the total bandwidth
>   of each link and compares that with the capabilities of the device.
> 
>   Note that the driver previously used dev_warn() to suggest using a
>   different slot, but pcie_print_link_status() uses dev_info() because if the
>   platform has no faster slot available, the user can't do anything about the
>   warning and may not want to be bothered with it.

Perfect! Thanks!

-Jake

^ permalink raw reply

* Re: [PATCH v5 05/14] PCI: Add pcie_print_link_status() to log link speed and whether it's limited
From: Tal Gilboa @ 2018-04-02 21:09 UTC (permalink / raw)
  To: Keller, Jacob E, Bjorn Helgaas
  Cc: Tariq Toukan, Ariel Elior, Ganesh Goudar, Kirsher, Jeffrey T,
	everest-linux-l2@cavium.com, intel-wired-lan@lists.osuosl.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-pci@vger.kernel.org
In-Reply-To: <02874ECE860811409154E81DA85FBB5882D49AFE@ORSMSX115.amr.corp.intel.com>

On 4/2/2018 11:25 PM, Keller, Jacob E wrote:
> 
> 
>> -----Original Message-----
>> From: Bjorn Helgaas [mailto:helgaas@kernel.org]
>> Sent: Monday, April 02, 2018 12:58 PM
>> To: Keller, Jacob E <jacob.e.keller@intel.com>
>> Cc: Tal Gilboa <talgi@mellanox.com>; Tariq Toukan <tariqt@mellanox.com>; Ariel
>> Elior <ariel.elior@cavium.com>; Ganesh Goudar <ganeshgr@chelsio.com>;
>> Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>; everest-linux-l2@cavium.com;
>> intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org; linux-
>> kernel@vger.kernel.org; linux-pci@vger.kernel.org
>> Subject: Re: [PATCH v5 05/14] PCI: Add pcie_print_link_status() to log link speed
>> and whether it's limited
>>
>> On Mon, Apr 02, 2018 at 04:25:17PM +0000, Keller, Jacob E wrote:
>>>> -----Original Message-----
>>>> From: Bjorn Helgaas [mailto:helgaas@kernel.org]
>>>> Sent: Friday, March 30, 2018 2:05 PM
>>>> To: Tal Gilboa <talgi@mellanox.com>
>>>> Cc: Tariq Toukan <tariqt@mellanox.com>; Keller, Jacob E
>>>> <jacob.e.keller@intel.com>; Ariel Elior <ariel.elior@cavium.com>; Ganesh
>>>> Goudar <ganeshgr@chelsio.com>; Kirsher, Jeffrey T
>>>> <jeffrey.t.kirsher@intel.com>; everest-linux-l2@cavium.com; intel-wired-
>>>> lan@lists.osuosl.org; netdev@vger.kernel.org; linux-kernel@vger.kernel.org;
>>>> linux-pci@vger.kernel.org
>>>> Subject: [PATCH v5 05/14] PCI: Add pcie_print_link_status() to log link speed
>> and
>>>> whether it's limited
>>>>
>>>> From: Tal Gilboa <talgi@mellanox.com>
>>>>
>>>> Add pcie_print_link_status().  This logs the current settings of the link
>>>> (speed, width, and total available bandwidth).
>>>>
>>>> If the device is capable of more bandwidth but is limited by a slower
>>>> upstream link, we include information about the link that limits the
>>>> device's performance.
>>>>
>>>> The user may be able to move the device to a different slot for better
>>>> performance.
>>>>
>>>> This provides a unified method for all PCI devices to report status and
>>>> issues, instead of each device reporting in a different way, using
>>>> different code.
>>>>
>>>> Signed-off-by: Tal Gilboa <talgi@mellanox.com>
>>>> [bhelgaas: changelog, reword log messages, print device capabilities when
>>>> not limited]
>>>> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>>>> ---
>>>>   drivers/pci/pci.c   |   29 +++++++++++++++++++++++++++++
>>>>   include/linux/pci.h |    1 +
>>>>   2 files changed, 30 insertions(+)
>>>>
>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>>> index e00d56b12747..cec7aed09f6b 100644
>>>> --- a/drivers/pci/pci.c
>>>> +++ b/drivers/pci/pci.c
>>>> @@ -5283,6 +5283,35 @@ u32 pcie_bandwidth_capable(struct pci_dev *dev,
>>>> enum pci_bus_speed *speed,
>>>>   	return *width * PCIE_SPEED2MBS_ENC(*speed);
>>>>   }
>>>>
>>>> +/**
>>>> + * pcie_print_link_status - Report the PCI device's link speed and width
>>>> + * @dev: PCI device to query
>>>> + *
>>>> + * Report the available bandwidth at the device.  If this is less than the
>>>> + * device is capable of, report the device's maximum possible bandwidth and
>>>> + * the upstream link that limits its performance to less than that.
>>>> + */
>>>> +void pcie_print_link_status(struct pci_dev *dev)
>>>> +{
>>>> +	enum pcie_link_width width, width_cap;
>>>> +	enum pci_bus_speed speed, speed_cap;
>>>> +	struct pci_dev *limiting_dev = NULL;
>>>> +	u32 bw_avail, bw_cap;
>>>> +
>>>> +	bw_cap = pcie_bandwidth_capable(dev, &speed_cap, &width_cap);
>>>> +	bw_avail = pcie_bandwidth_available(dev, &limiting_dev, &speed,
>>>> &width);
>>>> +
>>>> +	if (bw_avail >= bw_cap)
>>>> +		pci_info(dev, "%d Mb/s available bandwidth (%s x%d link)\n",
>>>> +			 bw_cap, PCIE_SPEED2STR(speed_cap), width_cap);
>>>> +	else
>>>> +		pci_info(dev, "%d Mb/s available bandwidth, limited by %s x%d
>>>> link at %s (capable of %d Mb/s with %s x%d link)\n",
>>>> +			 bw_avail, PCIE_SPEED2STR(speed), width,
>>>> +			 limiting_dev ? pci_name(limiting_dev) : "<unknown>",
>>>> +			 bw_cap, PCIE_SPEED2STR(speed_cap), width_cap);
>>>> +}
>>>
>>> Personally, I would make thic last one a pci_warn() to indicate it at a
>>> higher log level, but I'm  ok with the wording, and if consensus is that
>>> this should be at info, I'm ok with that.
>>
>> Tal's original patch did have a pci_warn() here, and we went back and
>> forth a bit.  They get bug reports when a device doesn't perform as
>> expected, which argues for pci_warn().  But they also got feedback
>> saying warnings are a bit too much, which argues for pci_info() [1]
>>
>> I don't have a really strong opinion either way.  I have a slight
>> preference for info because the user may not be able to do anything
>> about it (there may not be a faster slot available), and I think
>> distros are usually configured so a warning interrupts the smooth
>> graphical boot.
>>
>> It looks like mlx4, fm10k, and ixgbe currently use warnings, while
>> bnx2x, bnxt_en, and cxgb4 use info.  It's a tie so far :)
>>
>> [1] https://lkml.kernel.org/r/e47f3628-b56c-4d0a-f18b-
>> 5ffaf261ccc0@mellanox.com
>>
> 
> With that information, I'm fine with the proposal to display this as only an info. The message is still printed and can be used for debugging purposes, and I think that's really enough.
> 
>> Here's a proposal for printing the bandwidth as "x.xxx Gb/s":
> 
> Nice, I like that also.
> 
> Regards,
> Jake
> 

Same here for both.

^ permalink raw reply

* Re: [PATCH] PCI: allow drivers to limit the number of VFs to 0
From: Jakub Kicinski @ 2018-04-02 21:17 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bjorn Helgaas, linux-pci, netdev, Sathya Perla, Felix Manlunas,
	alexander.duyck, john.fastabend, Jacob Keller, Donald Dutile,
	oss-drivers
In-Reply-To: <20180330114905.GA10246@infradead.org>

On Fri, 30 Mar 2018 04:49:05 -0700, Christoph Hellwig wrote:
> On Thu, Mar 29, 2018 at 11:22:31AM -0700, Jakub Kicinski wrote:
> > Some user space depends on driver allowing sriov_totalvfs to be
> > enabled.  
> 
> I can't make sene of this sentence.  Can you explain what user space
> code depends on what semantics?  The sriov_totalvfs file should show
> up for any device supporting SR-IOV as far as I can tell.

Ugh, I took me a while to understand what I meant myself...  So what I
meant is that this should generally work:

# cat .../sriov_totalvfs > .../sriov_numvfs

I.e. one should be able to "enable" the number of VFs advertised in
sriov_totalvfs.  I will rephrase!

> > For devices which VF support depends on loaded FW we
> > have the pci_sriov_{g,s}et_totalvfs() API.  However, this API
> > uses 0 as a special "unset" value, meaning drivers can't limit
> > sriov_totalvfs to 0.  Change the special value to be U16_MAX.
> > Use simple min() to determine actual totalvfs.  
> 
> Please use a PCI_MAX_VFS or similar define instead of plain U16_MAX or ~0.

I think I may just go with what I think Alex is suggesting and "unset"
the value to total_VFs.

^ permalink raw reply

* Re: [PATCH] PCI: allow drivers to limit the number of VFs to 0
From: Jakub Kicinski @ 2018-04-02 21:18 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Christoph Hellwig, Bjorn Helgaas, linux-pci, Netdev, Sathya Perla,
	Felix Manlunas, John Fastabend, Jacob Keller, Donald Dutile,
	oss-drivers
In-Reply-To: <CAKgT0UfihwxLWf7BqP=Z87S83DPoqaR4MDq8UT8R4D6Puspy6w@mail.gmail.com>

On Fri, 30 Mar 2018 09:54:37 -0700, Alexander Duyck wrote:
> On Fri, Mar 30, 2018 at 4:49 AM, Christoph Hellwig <hch@infradead.org> wrote:
> > On Thu, Mar 29, 2018 at 11:22:31AM -0700, Jakub Kicinski wrote:  
> >> Some user space depends on driver allowing sriov_totalvfs to be
> >> enabled.  
> >
> > I can't make sene of this sentence.  Can you explain what user space
> > code depends on what semantics?  The sriov_totalvfs file should show
> > up for any device supporting SR-IOV as far as I can tell.
> >  
> >>
> >> For devices which VF support depends on loaded FW we
> >> have the pci_sriov_{g,s}et_totalvfs() API.  However, this API
> >> uses 0 as a special "unset" value, meaning drivers can't limit
> >> sriov_totalvfs to 0.  Change the special value to be U16_MAX.
> >> Use simple min() to determine actual totalvfs.  
> >
> > Please use a PCI_MAX_VFS or similar define instead of plain U16_MAX or ~0.  
> 
> Actually is there any reason why driver_max_VFs couldn't just be
> initialized to the same value as total_VFs?
> 
> Also looking over the patch I don't see how writing ~0 would be
> accepted unless you also make changes to pci_sriov_set_totalvfs since
> it should fail the "numvfs > dev->sriov->total_VFs" check. You might
> just want to look at adding a new function that would reset the
> driver_max_VFs value instead of trying to write it to an arbitrary
> value from the driver.

Ack, the reset function plus using total_VFs as unset seems a lot
cleaner, thanks!

^ permalink raw reply

* Re: WARNING: refcount bug in should_fail
From: Al Viro @ 2018-04-02 21:52 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Tetsuo Handa, syzbot+, syzkaller-bugs, dvyukov, linux-fsdevel,
	linux-mm, netdev
In-Reply-To: <87lge5z6yn.fsf@xmission.com>

On Mon, Apr 02, 2018 at 03:30:56PM -0500, Eric W. Biederman wrote:
> Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> writes:

> > I don't think this is a dup of existing bug.
> > We need to fix either 9ee332d99e4d5a97 or d91ee87d8d85a080.
> 
> Even if expanding mount_ns to more filesystems was magically fixed,
> proc would still have this issue with the pid namespace rather than
> the net namespace.
> 
> This is a mess.  I will take a look and see if I can see a a fix.

It's trivially fixable, and there's no need to modify mount_ns() at
all.

All we need is for rpc_kill_sb() to recognize whether we are already
through the point in rpc_fill_super() where the refcount is bumped.
That's it.

The most trivial way to do that is to move
	net = get_net(sb->s_fs_info);
past
        if (!root)
                return -ENOMEM;
in the latter and have
out:
	if (!sb->s_root)
		net = NULL;
        kill_litter_super(sb);
	if (net)
		put_net(net);
in the end of the former.  And similar changes in other affected
instances.

^ permalink raw reply

* Re: WARNING: refcount bug in should_fail
From: Al Viro @ 2018-04-02 21:59 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Tetsuo Handa, syzbot+, syzkaller-bugs, dvyukov, linux-fsdevel,
	linux-mm, netdev
In-Reply-To: <20180402215212.GF30522@ZenIV.linux.org.uk>

On Mon, Apr 02, 2018 at 10:52:12PM +0100, Al Viro wrote:
> On Mon, Apr 02, 2018 at 03:30:56PM -0500, Eric W. Biederman wrote:
> > Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> writes:
> 
> > > I don't think this is a dup of existing bug.
> > > We need to fix either 9ee332d99e4d5a97 or d91ee87d8d85a080.
> > 
> > Even if expanding mount_ns to more filesystems was magically fixed,
> > proc would still have this issue with the pid namespace rather than
> > the net namespace.
> > 
> > This is a mess.  I will take a look and see if I can see a a fix.
> 
> It's trivially fixable, and there's no need to modify mount_ns() at
> all.
> 
> All we need is for rpc_kill_sb() to recognize whether we are already
> through the point in rpc_fill_super() where the refcount is bumped.
> That's it.
> 
> The most trivial way to do that is to move
> 	net = get_net(sb->s_fs_info);
> past
>         if (!root)
>                 return -ENOMEM;
> in the latter and have
> out:
> 	if (!sb->s_root)
> 		net = NULL;
>         kill_litter_super(sb);
> 	if (net)
> 		put_net(net);
> in the end of the former.  And similar changes in other affected
> instances.

FWIW, I'm going through the ->kill_sb() instances, fixing that sort
of bugs (most of them preexisting, but I should've checked instead
of assuming that everything's fine).  Will push out later tonight.

^ permalink raw reply

* [PATCH] PCI: allow drivers to limit the number of VFs to 0
From: Jakub Kicinski @ 2018-04-02 22:46 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, netdev, Sathya Perla, Felix Manlunas, alexander.duyck,
	john.fastabend, Jacob Keller, Donald Dutile, oss-drivers,
	Christoph Hellwig, Jakub Kicinski

Some user space depends on enabling sriov_totalvfs number of VFs
to not fail, e.g.:

$ cat .../sriov_totalvfs > .../sriov_numvfs

For devices which VF support depends on loaded FW we have the
pci_sriov_{g,s}et_totalvfs() API.  However, this API uses 0 as
a special "unset" value, meaning drivers can't limit sriov_totalvfs
to 0.  Remove the special values completely and simply initialize
driver_max_VFs to total_VFs.  Then always use driver_max_VFs.
Add a helper for drivers to reset the VF limit back to total.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/nfp_main.c |  6 +++---
 drivers/pci/iov.c                             | 27 +++++++++++++++++++++------
 include/linux/pci.h                           |  2 ++
 3 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.c b/drivers/net/ethernet/netronome/nfp/nfp_main.c
index c4b1f344b4da..a76d177e40dd 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_main.c
@@ -123,7 +123,7 @@ static int nfp_pcie_sriov_read_nfd_limit(struct nfp_pf *pf)
 		return pci_sriov_set_totalvfs(pf->pdev, pf->limit_vfs);
 
 	pf->limit_vfs = ~0;
-	pci_sriov_set_totalvfs(pf->pdev, 0); /* 0 is unset */
+	pci_sriov_reset_totalvfs(pf->pdev);
 	/* Allow any setting for backwards compatibility if symbol not found */
 	if (err == -ENOENT)
 		return 0;
@@ -537,7 +537,7 @@ static int nfp_pci_probe(struct pci_dev *pdev,
 err_net_remove:
 	nfp_net_pci_remove(pf);
 err_sriov_unlimit:
-	pci_sriov_set_totalvfs(pf->pdev, 0);
+	pci_sriov_reset_totalvfs(pf->pdev);
 err_fw_unload:
 	kfree(pf->rtbl);
 	nfp_mip_close(pf->mip);
@@ -570,7 +570,7 @@ static void nfp_pci_remove(struct pci_dev *pdev)
 	nfp_hwmon_unregister(pf);
 
 	nfp_pcie_sriov_disable(pdev);
-	pci_sriov_set_totalvfs(pf->pdev, 0);
+	pci_sriov_reset_totalvfs(pf->pdev);
 
 	nfp_net_pci_remove(pf);
 
diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 677924ae0350..c63ea870d8be 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -443,6 +443,7 @@ static int sriov_init(struct pci_dev *dev, int pos)
 	iov->nres = nres;
 	iov->ctrl = ctrl;
 	iov->total_VFs = total;
+	iov->driver_max_VFs = total;
 	pci_read_config_word(dev, pos + PCI_SRIOV_VF_DID, &iov->vf_device);
 	iov->pgsz = pgsz;
 	iov->self = dev;
@@ -788,12 +789,29 @@ int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs)
 }
 EXPORT_SYMBOL_GPL(pci_sriov_set_totalvfs);
 
+/**
+ * pci_sriov_reset_totalvfs -- return the TotalVFs value to the default
+ * @dev: the PCI PF device
+ *
+ * Should be called from PF driver's remove routine with
+ * device's mutex held.
+ */
+void pci_sriov_reset_totalvfs(struct pci_dev *dev)
+{
+	/* Shouldn't change if VFs already enabled */
+	if (!dev->is_physfn || dev->sriov->ctrl & PCI_SRIOV_CTRL_VFE)
+		return;
+
+	dev->sriov->driver_max_VFs = dev->sriov->total_VFs;
+}
+EXPORT_SYMBOL_GPL(pci_sriov_reset_totalvfs);
+
 /**
  * pci_sriov_get_totalvfs -- get total VFs supported on this device
  * @dev: the PCI PF device
  *
- * For a PCIe device with SRIOV support, return the PCIe
- * SRIOV capability value of TotalVFs or the value of driver_max_VFs
+ * For a PCIe device with SRIOV support, return the value of driver_max_VFs
+ * which can be equal to the PCIe SRIOV capability value of TotalVFs or lower
  * if the driver reduced it.  Otherwise 0.
  */
 int pci_sriov_get_totalvfs(struct pci_dev *dev)
@@ -801,9 +819,6 @@ int pci_sriov_get_totalvfs(struct pci_dev *dev)
 	if (!dev->is_physfn)
 		return 0;
 
-	if (dev->sriov->driver_max_VFs)
-		return dev->sriov->driver_max_VFs;
-
-	return dev->sriov->total_VFs;
+	return dev->sriov->driver_max_VFs;
 }
 EXPORT_SYMBOL_GPL(pci_sriov_get_totalvfs);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 024a1beda008..95fde8850393 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1952,6 +1952,7 @@ void pci_iov_remove_virtfn(struct pci_dev *dev, int id);
 int pci_num_vf(struct pci_dev *dev);
 int pci_vfs_assigned(struct pci_dev *dev);
 int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
+void pci_sriov_reset_totalvfs(struct pci_dev *dev);
 int pci_sriov_get_totalvfs(struct pci_dev *dev);
 resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno);
 void pci_vf_drivers_autoprobe(struct pci_dev *dev, bool probe);
@@ -1978,6 +1979,7 @@ static inline int pci_vfs_assigned(struct pci_dev *dev)
 { return 0; }
 static inline int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs)
 { return 0; }
+static inline void pci_sriov_reset_totalvfs(struct pci_dev *dev) { }
 static inline int pci_sriov_get_totalvfs(struct pci_dev *dev)
 { return 0; }
 static inline resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno)
-- 
2.16.2

^ permalink raw reply related

* [PATCH net-next] rxrpc: Fix undefined packet handling
From: David Howells @ 2018-04-02 22:51 UTC (permalink / raw)
  To: netdev; +Cc: dhowells, linux-afs, linux-kernel

By analogy with other Rx implementations, RxRPC packet types 9, 10 and 11
should just be discarded rather than being aborted like other undefined
packet types.

Reported-by: Jeffrey Altman <jaltman@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
---

 net/rxrpc/input.c    |    6 ++++++
 net/rxrpc/protocol.h |    6 ++++++
 2 files changed, 12 insertions(+)

diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 21800e6f5019..0410d2277ca2 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -1200,6 +1200,12 @@ void rxrpc_data_ready(struct sock *udp_sk)
 		    !rxrpc_validate_jumbo(skb))
 			goto bad_message;
 		break;
+
+		/* Packet types 9-11 should just be ignored. */
+	case RXRPC_PACKET_TYPE_PARAMS:
+	case RXRPC_PACKET_TYPE_10:
+	case RXRPC_PACKET_TYPE_11:
+		goto discard;
 	}
 
 	rcu_read_lock();
diff --git a/net/rxrpc/protocol.h b/net/rxrpc/protocol.h
index 4bddcf3face3..93da73bf7098 100644
--- a/net/rxrpc/protocol.h
+++ b/net/rxrpc/protocol.h
@@ -46,6 +46,9 @@ struct rxrpc_wire_header {
 #define RXRPC_PACKET_TYPE_CHALLENGE	6	/* connection security challenge (SRVR->CLNT) */
 #define RXRPC_PACKET_TYPE_RESPONSE	7	/* connection secutity response (CLNT->SRVR) */
 #define RXRPC_PACKET_TYPE_DEBUG		8	/* debug info request */
+#define RXRPC_PACKET_TYPE_PARAMS	9	/* Parameter negotiation (unspec'd, ignore) */
+#define RXRPC_PACKET_TYPE_10		10	/* Ignored */
+#define RXRPC_PACKET_TYPE_11		11	/* Ignored */
 #define RXRPC_PACKET_TYPE_VERSION	13	/* version string request */
 #define RXRPC_N_PACKET_TYPES		14	/* number of packet types (incl type 0) */
 
@@ -78,6 +81,9 @@ struct rxrpc_wire_header {
 		(1 << RXRPC_PACKET_TYPE_CHALLENGE) |	\
 		(1 << RXRPC_PACKET_TYPE_RESPONSE) |	\
 		/*(1 << RXRPC_PACKET_TYPE_DEBUG) | */	\
+		(1 << RXRPC_PACKET_TYPE_PARAMS) |	\
+		(1 << RXRPC_PACKET_TYPE_10) |		\
+		(1 << RXRPC_PACKET_TYPE_11) |		\
 		(1 << RXRPC_PACKET_TYPE_VERSION))
 
 /*****************************************************************************/

^ permalink raw reply related

* [PATCH net 0/2] net: Broadcom drivers sparse fixes
From: Florian Fainelli @ 2018-04-02 22:58 UTC (permalink / raw)
  To: netdev; +Cc: Florian Fainelli, Doug Berger, open list

Hi all,

This patch series fixes the same warning reported by sparse in bcmsysport and
bcmgenet in the code that deals with inserting the TX checksum pointers:

drivers/net/ethernet/broadcom/bcmsysport.c:1155:26: warning: cast from restricted __be16
drivers/net/ethernet/broadcom/bcmsysport.c:1155:26: warning: incorrect type in argument 1 (different base types)
drivers/net/ethernet/broadcom/bcmsysport.c:1155:26:    expected unsigned short [unsigned] [usertype] val
drivers/net/ethernet/broadcom/bcmsysport.c:1155:26:    got restricted __be16 [usertype] protocol

This patch fixes both issues by using the same construct and not swapping
skb->protocol but instead the values we are checking against.

Florian Fainelli (2):
  net: bcmgenet: Fix sparse warnings in bcmgenet_put_tx_csum()
  net: systemport: Fix sparse warnings in bcm_sysport_insert_tsb()

 drivers/net/ethernet/broadcom/bcmsysport.c     | 11 ++++++-----
 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 11 ++++++-----
 2 files changed, 12 insertions(+), 10 deletions(-)

-- 
2.14.1

^ permalink raw reply

* [PATCH net 1/2] net: bcmgenet: Fix sparse warnings in bcmgenet_put_tx_csum()
From: Florian Fainelli @ 2018-04-02 22:58 UTC (permalink / raw)
  To: netdev; +Cc: Florian Fainelli, Doug Berger, open list
In-Reply-To: <20180402225856.4351-1-f.fainelli@gmail.com>

skb->protocol is a __be16 which we would be calling htons() against,
while this is not wrong per-se as it correctly results in swapping the
value on LE hosts, this still upsets sparse. Adopt a similar pattern to
what other drivers do and just assign ip_ver to skb->protocol, and then
use htons() against the different constants such that the compiler can
resolve the values at build time.

Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index b1e35a9accf1..aeb329690f78 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -1460,7 +1460,7 @@ static struct sk_buff *bcmgenet_put_tx_csum(struct net_device *dev,
 	struct sk_buff *new_skb;
 	u16 offset;
 	u8 ip_proto;
-	u16 ip_ver;
+	__be16 ip_ver;
 	u32 tx_csum_info;
 
 	if (unlikely(skb_headroom(skb) < sizeof(*status))) {
@@ -1480,12 +1480,12 @@ static struct sk_buff *bcmgenet_put_tx_csum(struct net_device *dev,
 	status = (struct status_64 *)skb->data;
 
 	if (skb->ip_summed  == CHECKSUM_PARTIAL) {
-		ip_ver = htons(skb->protocol);
+		ip_ver = skb->protocol;
 		switch (ip_ver) {
-		case ETH_P_IP:
+		case htons(ETH_P_IP):
 			ip_proto = ip_hdr(skb)->protocol;
 			break;
-		case ETH_P_IPV6:
+		case htons(ETH_P_IPV6):
 			ip_proto = ipv6_hdr(skb)->nexthdr;
 			break;
 		default:
@@ -1501,7 +1501,8 @@ static struct sk_buff *bcmgenet_put_tx_csum(struct net_device *dev,
 		 */
 		if (ip_proto == IPPROTO_TCP || ip_proto == IPPROTO_UDP) {
 			tx_csum_info |= STATUS_TX_CSUM_LV;
-			if (ip_proto == IPPROTO_UDP && ip_ver == ETH_P_IP)
+			if (ip_proto == IPPROTO_UDP &&
+			    ip_ver == htons(ETH_P_IP))
 				tx_csum_info |= STATUS_TX_CSUM_PROTO_UDP;
 		} else {
 			tx_csum_info = 0;
-- 
2.14.1

^ permalink raw reply related

* [PATCH net 2/2] net: systemport: Fix sparse warnings in bcm_sysport_insert_tsb()
From: Florian Fainelli @ 2018-04-02 22:58 UTC (permalink / raw)
  To: netdev; +Cc: Florian Fainelli, Doug Berger, open list
In-Reply-To: <20180402225856.4351-1-f.fainelli@gmail.com>

skb->protocol is a __be16 which we would be calling htons() against,
while this is not wrong per-se as it correctly results in swapping the
value on LE hosts, this still upsets sparse. Adopt a similar pattern to
what other drivers do and just assign ip_ver to skb->protocol, and then
use htons() against the different constants such that the compiler can
resolve the values at build time.

Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/broadcom/bcmsysport.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c
index 3fc549b88c43..879db1c700bd 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -1133,7 +1133,7 @@ static struct sk_buff *bcm_sysport_insert_tsb(struct sk_buff *skb,
 	u32 csum_info;
 	u8 ip_proto;
 	u16 csum_start;
-	u16 ip_ver;
+	__be16 ip_ver;
 
 	/* Re-allocate SKB if needed */
 	if (unlikely(skb_headroom(skb) < sizeof(*tsb))) {
@@ -1152,12 +1152,12 @@ static struct sk_buff *bcm_sysport_insert_tsb(struct sk_buff *skb,
 	memset(tsb, 0, sizeof(*tsb));
 
 	if (skb->ip_summed == CHECKSUM_PARTIAL) {
-		ip_ver = htons(skb->protocol);
+		ip_ver = skb->protocol;
 		switch (ip_ver) {
-		case ETH_P_IP:
+		case htons(ETH_P_IP):
 			ip_proto = ip_hdr(skb)->protocol;
 			break;
-		case ETH_P_IPV6:
+		case htons(ETH_P_IPV6):
 			ip_proto = ipv6_hdr(skb)->nexthdr;
 			break;
 		default:
@@ -1171,7 +1171,8 @@ static struct sk_buff *bcm_sysport_insert_tsb(struct sk_buff *skb,
 
 		if (ip_proto == IPPROTO_TCP || ip_proto == IPPROTO_UDP) {
 			csum_info |= L4_LENGTH_VALID;
-			if (ip_proto == IPPROTO_UDP && ip_ver == ETH_P_IP)
+			if (ip_proto == IPPROTO_UDP &&
+			    ip_ver == htons(ETH_P_IP))
 				csum_info |= L4_UDP;
 		} else {
 			csum_info = 0;
-- 
2.14.1

^ permalink raw reply related

* [PATCH net] net: dsa: b53: Fix sparse warnings in b53_mmap.c
From: Florian Fainelli @ 2018-04-02 23:17 UTC (permalink / raw)
  To: netdev
  Cc: jonas.gorski, Florian Fainelli, Andrew Lunn, Vivien Didelot,
	open list

sparse complains about the following warnings:

drivers/net/dsa/b53/b53_mmap.c:33:31: warning: incorrect type in
initializer (different address spaces)
drivers/net/dsa/b53/b53_mmap.c:33:31:    expected unsigned char
[noderef] [usertype] <asn:2>*regs
drivers/net/dsa/b53/b53_mmap.c:33:31:    got void *priv

and indeed, while what we are doing is functional, we are dereferencing
a void * pointer into a void __iomem * which is not great. Just use the
defined b53_mmap_priv structure which holds our register base and use
that.

Fixes: 967dd82ffc52 ("net: dsa: b53: Add support for Broadcom RoboSwitch")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/dsa/b53/b53_mmap.c | 33 ++++++++++++++++++++++++---------
 1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/drivers/net/dsa/b53/b53_mmap.c b/drivers/net/dsa/b53/b53_mmap.c
index ef63d24fef81..c628d0980c0b 100644
--- a/drivers/net/dsa/b53/b53_mmap.c
+++ b/drivers/net/dsa/b53/b53_mmap.c
@@ -30,7 +30,8 @@ struct b53_mmap_priv {
 
 static int b53_mmap_read8(struct b53_device *dev, u8 page, u8 reg, u8 *val)
 {
-	u8 __iomem *regs = dev->priv;
+	struct b53_mmap_priv *priv = dev->priv;
+	void __iomem *regs = priv->regs;
 
 	*val = readb(regs + (page << 8) + reg);
 
@@ -39,7 +40,8 @@ static int b53_mmap_read8(struct b53_device *dev, u8 page, u8 reg, u8 *val)
 
 static int b53_mmap_read16(struct b53_device *dev, u8 page, u8 reg, u16 *val)
 {
-	u8 __iomem *regs = dev->priv;
+	struct b53_mmap_priv *priv = dev->priv;
+	void __iomem *regs = priv->regs;
 
 	if (WARN_ON(reg % 2))
 		return -EINVAL;
@@ -54,7 +56,8 @@ static int b53_mmap_read16(struct b53_device *dev, u8 page, u8 reg, u16 *val)
 
 static int b53_mmap_read32(struct b53_device *dev, u8 page, u8 reg, u32 *val)
 {
-	u8 __iomem *regs = dev->priv;
+	struct b53_mmap_priv *priv = dev->priv;
+	void __iomem *regs = priv->regs;
 
 	if (WARN_ON(reg % 4))
 		return -EINVAL;
@@ -69,7 +72,8 @@ static int b53_mmap_read32(struct b53_device *dev, u8 page, u8 reg, u32 *val)
 
 static int b53_mmap_read48(struct b53_device *dev, u8 page, u8 reg, u64 *val)
 {
-	u8 __iomem *regs = dev->priv;
+	struct b53_mmap_priv *priv = dev->priv;
+	void __iomem *regs = priv->regs;
 
 	if (WARN_ON(reg % 2))
 		return -EINVAL;
@@ -107,7 +111,8 @@ static int b53_mmap_read48(struct b53_device *dev, u8 page, u8 reg, u64 *val)
 
 static int b53_mmap_read64(struct b53_device *dev, u8 page, u8 reg, u64 *val)
 {
-	u8 __iomem *regs = dev->priv;
+	struct b53_mmap_priv *priv = dev->priv;
+	void __iomem *regs = priv->regs;
 	u32 hi, lo;
 
 	if (WARN_ON(reg % 4))
@@ -128,7 +133,8 @@ static int b53_mmap_read64(struct b53_device *dev, u8 page, u8 reg, u64 *val)
 
 static int b53_mmap_write8(struct b53_device *dev, u8 page, u8 reg, u8 value)
 {
-	u8 __iomem *regs = dev->priv;
+	struct b53_mmap_priv *priv = dev->priv;
+	void __iomem *regs = priv->regs;
 
 	writeb(value, regs + (page << 8) + reg);
 
@@ -138,7 +144,8 @@ static int b53_mmap_write8(struct b53_device *dev, u8 page, u8 reg, u8 value)
 static int b53_mmap_write16(struct b53_device *dev, u8 page, u8 reg,
 			    u16 value)
 {
-	u8 __iomem *regs = dev->priv;
+	struct b53_mmap_priv *priv = dev->priv;
+	void __iomem *regs = priv->regs;
 
 	if (WARN_ON(reg % 2))
 		return -EINVAL;
@@ -154,7 +161,8 @@ static int b53_mmap_write16(struct b53_device *dev, u8 page, u8 reg,
 static int b53_mmap_write32(struct b53_device *dev, u8 page, u8 reg,
 			    u32 value)
 {
-	u8 __iomem *regs = dev->priv;
+	struct b53_mmap_priv *priv = dev->priv;
+	void __iomem *regs = priv->regs;
 
 	if (WARN_ON(reg % 4))
 		return -EINVAL;
@@ -223,12 +231,19 @@ static const struct b53_io_ops b53_mmap_ops = {
 static int b53_mmap_probe(struct platform_device *pdev)
 {
 	struct b53_platform_data *pdata = pdev->dev.platform_data;
+	struct b53_mmap_priv *priv;
 	struct b53_device *dev;
 
 	if (!pdata)
 		return -EINVAL;
 
-	dev = b53_switch_alloc(&pdev->dev, &b53_mmap_ops, pdata->regs);
+	priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return -ENOMEM;
+
+	priv->regs = pdata->regs;
+
+	dev = b53_switch_alloc(&pdev->dev, &b53_mmap_ops, priv);
 	if (!dev)
 		return -ENOMEM;
 
-- 
2.14.1

^ permalink raw reply related

* [PATCH net] net: dsa: mt7530: Use NULL instead of plain integer
From: Florian Fainelli @ 2018-04-02 23:24 UTC (permalink / raw)
  To: netdev; +Cc: Florian Fainelli, Sean Wang, Andrew Lunn, Vivien Didelot,
	open list

We would be passing 0 instead of NULL as the rsp argument to
mt7530_fdb_cmd(), fix that.

Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/dsa/mt7530.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c
index 4e53c5ce23ff..a7f6c0a62fc7 100644
--- a/drivers/net/dsa/mt7530.c
+++ b/drivers/net/dsa/mt7530.c
@@ -917,7 +917,7 @@ mt7530_port_fdb_add(struct dsa_switch *ds, int port,
 
 	mutex_lock(&priv->reg_mutex);
 	mt7530_fdb_write(priv, vid, port_mask, addr, -1, STATIC_ENT);
-	ret = mt7530_fdb_cmd(priv, MT7530_FDB_WRITE, 0);
+	ret = mt7530_fdb_cmd(priv, MT7530_FDB_WRITE, NULL);
 	mutex_unlock(&priv->reg_mutex);
 
 	return ret;
@@ -933,7 +933,7 @@ mt7530_port_fdb_del(struct dsa_switch *ds, int port,
 
 	mutex_lock(&priv->reg_mutex);
 	mt7530_fdb_write(priv, vid, port_mask, addr, -1, STATIC_EMP);
-	ret = mt7530_fdb_cmd(priv, MT7530_FDB_WRITE, 0);
+	ret = mt7530_fdb_cmd(priv, MT7530_FDB_WRITE, NULL);
 	mutex_unlock(&priv->reg_mutex);
 
 	return ret;
@@ -1293,7 +1293,7 @@ mt7530_setup(struct dsa_switch *ds)
 	}
 
 	/* Flush the FDB table */
-	ret = mt7530_fdb_cmd(priv, MT7530_FDB_FLUSH, 0);
+	ret = mt7530_fdb_cmd(priv, MT7530_FDB_FLUSH, NULL);
 	if (ret < 0)
 		return ret;
 
-- 
2.14.1

^ permalink raw reply related

* Re: [PATCH v5 03/14] PCI: Add pcie_bandwidth_capable() to compute max supported link bandwidth
From: Jacob Keller @ 2018-04-03  0:30 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Tal Gilboa, Tariq Toukan, Jacob Keller, Ariel Elior,
	Ganesh Goudar, Jeff Kirsher, everest-linux-l2, intel-wired-lan,
	netdev, linux-kernel, linux-pci
In-Reply-To: <20180402140501.GA244675@bhelgaas-glaptop.roam.corp.google.com>

On Mon, Apr 2, 2018 at 7:05 AM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> +/* PCIe speed to Mb/s reduced by encoding overhead */
> +#define PCIE_SPEED2MBS_ENC(speed) \
> +       ((speed) == PCIE_SPEED_16_0GT ? (16000*(128/130)) : \
> +        (speed) == PCIE_SPEED_8_0GT  ?  (8000*(128/130)) : \
> +        (speed) == PCIE_SPEED_5_0GT  ?  (5000*(8/10)) : \
> +        (speed) == PCIE_SPEED_2_5GT  ?  (2500*(8/10)) : \
> +        0)
> +

Should this be "(speed * x ) / y" instead? wouldn't they calculate
128/130 and truncate that to zero before multiplying by the speed? Or
are compilers smart enough to do this the other way to avoid the
losses?

Thanks,
Jake

^ permalink raw reply

* Re: [PATCH net-next 4/5] tun: Add support for SCTP checksum offload
From: kbuild test robot @ 2018-04-03  0:49 UTC (permalink / raw)
  To: Vladislav Yasevich
  Cc: kbuild-all, netdev, linux-sctp, virtualization, mst, jasowang,
	nhorman, Vladislav Yasevich
In-Reply-To: <20180402134006.10111-5-vyasevic@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 2668 bytes --]

Hi Vladislav,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Vladislav-Yasevich/virtio-net-Add-SCTP-checksum-offload-support/20180402-221407
config: m68k-hp300_defconfig (attached as .config)
compiler: m68k-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=m68k 

Note: the linux-review/Vladislav-Yasevich/virtio-net-Add-SCTP-checksum-offload-support/20180402-221407 HEAD 5e0497a085e70055a1981959802173f4ff05c86b builds fine.
      It only hurts bisectibility.

All errors (new ones prefixed by >>):

   drivers/net/tun.c: In function 'set_offload':
>> drivers/net/tun.c:2722:12: error: 'TUN_F_SCTP_CSUM' undeclared (first use in this function); did you mean 'TUN_F_CSUM'?
     if (arg & TUN_F_SCTP_CSUM) {
               ^~~~~~~~~~~~~~~
               TUN_F_CSUM
   drivers/net/tun.c:2722:12: note: each undeclared identifier is reported only once for each function it appears in

vim +2722 drivers/net/tun.c

  2696	
  2697	/* This is like a cut-down ethtool ops, except done via tun fd so no
  2698	 * privs required. */
  2699	static int set_offload(struct tun_struct *tun, unsigned long arg)
  2700	{
  2701		netdev_features_t features = 0;
  2702	
  2703		if (arg & TUN_F_CSUM) {
  2704			features |= NETIF_F_HW_CSUM;
  2705			arg &= ~TUN_F_CSUM;
  2706	
  2707			if (arg & (TUN_F_TSO4|TUN_F_TSO6)) {
  2708				if (arg & TUN_F_TSO_ECN) {
  2709					features |= NETIF_F_TSO_ECN;
  2710					arg &= ~TUN_F_TSO_ECN;
  2711				}
  2712				if (arg & TUN_F_TSO4)
  2713					features |= NETIF_F_TSO;
  2714				if (arg & TUN_F_TSO6)
  2715					features |= NETIF_F_TSO6;
  2716				arg &= ~(TUN_F_TSO4|TUN_F_TSO6);
  2717			}
  2718	
  2719			arg &= ~TUN_F_UFO;
  2720		}
  2721	
> 2722		if (arg & TUN_F_SCTP_CSUM) {
  2723			features |= NETIF_F_SCTP_CRC;
  2724			arg &= ~TUN_F_SCTP_CSUM;
  2725		}
  2726	
  2727		/* This gives the user a way to test for new features in future by
  2728		 * trying to set them. */
  2729		if (arg)
  2730			return -EINVAL;
  2731	
  2732		tun->set_features = features;
  2733		tun->dev->wanted_features &= ~TUN_USER_FEATURES;
  2734		tun->dev->wanted_features |= features;
  2735		netdev_update_features(tun->dev);
  2736	
  2737		return 0;
  2738	}
  2739	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 12572 bytes --]

^ permalink raw reply

* Re: [PATCH v2 bpf-next 0/3] bpf/verifier: subprog/func_call simplifications
From: Alexei Starovoitov @ 2018-04-03  1:08 UTC (permalink / raw)
  To: Edward Cree; +Cc: Daniel Borkmann, netdev
In-Reply-To: <d13f6b60-3632-3e4d-111a-2edb4198b473@solarflare.com>

On Thu, Mar 29, 2018 at 11:44:17PM +0100, Edward Cree wrote:
> By storing subprog boundaries as a subprogno mark on each insn, rather than
>  a start (and implicit end) for each subprog, we collect a number of gains:
> * More efficient determination of which subprog contains a given insn, and
>   thus of find_subprog (which subprog begins at a given insn).
> * Number of verifier "full recursive walk" passes is reduced, since most of
>   the work is done in the main insn walk (do_check()).  Leftover work in
>   other passes is mostly linear scans (O(insn_cnt)) or, in the case of
>   check_max_stack_depth(), a topological sort (O(subprog_cnt)).
> 
> Some other changes were also included to support this:
> * Per-subprog info is stored in env->subprog_info, an array of structs,
>   rather than several arrays with a common index.
> * Call graph is now stored in the new bpf_subprog_info struct; used here
>   for check_max_stack_depth() but may have other uses too.
> 
> Along with this, patch #3 puts parent pointers (used by liveness analysis)
>  in the registers instead of the func_state or verifier_state, so that we
>  don't need skip_callee() machinery.  This also does the right thing for
>  stack slots, so they don't need their own special handling for liveness
>  marking either.

I like patch 3 and going to play with it.
How did you test it ? Do you have processed_insn numbers for
cilium or selftests programs before and after?
There should be no difference, right?
Essentially it's trading extra run-time cost of skip_callee logic
for higher memory overhead due to parent pointers in every register
state. I guess that's ok, since it does cleanup things nicely.

As far as patch 1 it was very difficult to review, since several logically
different things clamped together. So breaking it apart:
- converting two arrays of subprog_starts and subprog_stack_depth into
  single array of struct bpf_subprog_info is a good thing
- tsort is interesting, but not sure it's correct. more below
- but main change of combining subprog logic with do_check is no good.
The verifier have to move towards compiler style code analysis instead of
the other way around. It has to analyze the program in simple and
hopefully easy to understand passes instead of combinging everything
into one loop. We _have_ to get rid of do_check() approach and
corresponding insn_processed counter. That was and still is
the biggest and most pressing issue we need to solve in bpf verification.
The verifier has to switch to compiler style code analysis to scale.
The algorithms should be such that scale with thousands of instructions
and thousands of functions. The only way I see reaching that goal
is to do hierarchical program analysis in passes:
- identify subrpograms
- identify basic blocks, build control flow graph
- build dom graph, ssa and so on
- and do per-basic block liveness and data flow analysis that propagates
  the combined result further into other BBs along cfg edges.
There will be no 'do_check() across whole program' walk.
Combining subprog pass with do_check is going into opposite direction
of this long term work. Divide and conquer. Combining more things into
do_check is the opposite of this programming principle.
My short term plan is to split basic instruction correctness checks
out of do_check() loop into separate pass and run it early on.
It's necessary for bpf libraries to work, since the verifier cannot do
full data flow analysis at this point, but should build cfg and move
as much as possible out of instruction by instruction walk to scale
with number of bpf programs/libraries and sizes of them.

As far as tsort approach for determining max stack depth...
It's an interesting idea, but implementation is suffering from the same
'combine everything into one loop' coding issue.
I think updating total_stack_depth math should be separate from sorting,
since the same function can be part of different stacks with different max.
I don't see how updating global subprog_info[i].total_stack_depth can
be correct. It has to be different for every stack and the longest
stack is not necessarily the deepest. May be I'm missing something,
but combined sort and stack_depth math didn't make it easy to review.
I find existing check_max_stack_depth() algo much easier to understand.

^ permalink raw reply

* [PATCH net-next] pptp: remove a buggy dst release in pptp_connect()
From: Eric Dumazet @ 2018-04-03  1:48 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet

Once dst has been cached in socket via sk_setup_caps(),
it is illegal to call ip_rt_put() (or dst_release()),
since sk_setup_caps() did not change dst refcount.

We can still dereference it since we hold socket lock.

Caugth by syzbot :

BUG: KASAN: use-after-free in atomic_dec_return include/asm-generic/atomic-instrumented.h:198 [inline]
BUG: KASAN: use-after-free in dst_release+0x27/0xa0 net/core/dst.c:185
Write of size 4 at addr ffff8801c54dc040 by task syz-executor4/20088

CPU: 1 PID: 20088 Comm: syz-executor4 Not tainted 4.16.0+ #376
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x1a7/0x27d lib/dump_stack.c:53
 print_address_description+0x73/0x250 mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report+0x23c/0x360 mm/kasan/report.c:412
 check_memory_region_inline mm/kasan/kasan.c:260 [inline]
 check_memory_region+0x137/0x190 mm/kasan/kasan.c:267
 kasan_check_write+0x14/0x20 mm/kasan/kasan.c:278
 atomic_dec_return include/asm-generic/atomic-instrumented.h:198 [inline]
 dst_release+0x27/0xa0 net/core/dst.c:185
 sk_dst_set include/net/sock.h:1812 [inline]
 sk_dst_reset include/net/sock.h:1824 [inline]
 sock_setbindtodevice net/core/sock.c:610 [inline]
 sock_setsockopt+0x431/0x1b20 net/core/sock.c:707
 SYSC_setsockopt net/socket.c:1845 [inline]
 SyS_setsockopt+0x2ff/0x360 net/socket.c:1828
 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x4552d9
RSP: 002b:00007f4878126c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 00007f48781276d4 RCX: 00000000004552d9
RDX: 0000000000000019 RSI: 0000000000000001 RDI: 0000000000000013
RBP: 000000000072bea0 R08: 0000000000000010 R09: 0000000000000000
R10: 00000000200010c0 R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000000000000526 R14: 00000000006fac30 R15: 0000000000000000

Allocated by task 20088:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:447
 set_track mm/kasan/kasan.c:459 [inline]
 kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:552
 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489
 kmem_cache_alloc+0x12e/0x760 mm/slab.c:3542
 dst_alloc+0x11f/0x1a0 net/core/dst.c:104
 rt_dst_alloc+0xe9/0x540 net/ipv4/route.c:1520
 __mkroute_output net/ipv4/route.c:2265 [inline]
 ip_route_output_key_hash_rcu+0xa49/0x2c60 net/ipv4/route.c:2493
 ip_route_output_key_hash+0x20b/0x370 net/ipv4/route.c:2322
 __ip_route_output_key include/net/route.h:126 [inline]
 ip_route_output_flow+0x26/0xa0 net/ipv4/route.c:2577
 ip_route_output_ports include/net/route.h:163 [inline]
 pptp_connect+0xa84/0x1170 drivers/net/ppp/pptp.c:453
 SYSC_connect+0x213/0x4a0 net/socket.c:1639
 SyS_connect+0x24/0x30 net/socket.c:1620
 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x42/0xb7

Freed by task 20082:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:447
 set_track mm/kasan/kasan.c:459 [inline]
 __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:520
 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:527
 __cache_free mm/slab.c:3486 [inline]
 kmem_cache_free+0x83/0x2a0 mm/slab.c:3744
 dst_destroy+0x266/0x380 net/core/dst.c:140
 dst_destroy_rcu+0x16/0x20 net/core/dst.c:153
 __rcu_reclaim kernel/rcu/rcu.h:178 [inline]
 rcu_do_batch kernel/rcu/tree.c:2675 [inline]
 invoke_rcu_callbacks kernel/rcu/tree.c:2930 [inline]
 __rcu_process_callbacks kernel/rcu/tree.c:2897 [inline]
 rcu_process_callbacks+0xd6c/0x17b0 kernel/rcu/tree.c:2914
 __do_softirq+0x2d7/0xb85 kernel/softirq.c:285

The buggy address belongs to the object at ffff8801c54dc000
 which belongs to the cache ip_dst_cache of size 168
The buggy address is located 64 bytes inside of
 168-byte region [ffff8801c54dc000, ffff8801c54dc0a8)
The buggy address belongs to the page:
page:ffffea0007153700 count:1 mapcount:0 mapping:ffff8801c54dc000 index:0x0
flags: 0x2fffc0000000100(slab)
raw: 02fffc0000000100 ffff8801c54dc000 0000000000000000 0000000100000010
raw: ffffea0006b34b20 ffffea0006b6c1e0 ffff8801d674a1c0 0000000000000000
page dumped because: kasan: bad access detected

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/ppp/pptp.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ppp/pptp.c b/drivers/net/ppp/pptp.c
index 8249d46a784493faedbfad5ec78212afab8c872e..c4267ecefd85271ac561e94d57c586179e66777c 100644
--- a/drivers/net/ppp/pptp.c
+++ b/drivers/net/ppp/pptp.c
@@ -464,7 +464,6 @@ static int pptp_connect(struct socket *sock, struct sockaddr *uservaddr,
 	po->chan.mtu = dst_mtu(&rt->dst);
 	if (!po->chan.mtu)
 		po->chan.mtu = PPP_MRU;
-	ip_rt_put(rt);
 	po->chan.mtu -= PPTP_HEADER_OVERHEAD;
 
 	po->chan.hdrlen = 2 + sizeof(struct pptp_gre_header);
-- 
2.17.0.rc1.321.gba9d0f2565-goog

^ permalink raw reply related

* Re: [PATCH net] net: dsa: mt7530: Use NULL instead of plain integer
From: Sean Wang @ 2018-04-03  2:18 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Vivien Didelot,
	linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, open list,
	Andrew Lunn
In-Reply-To: <20180402232414.19671-1-f.fainelli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

On Mon, 2018-04-02 at 16:24 -0700, Florian Fainelli wrote:
> We would be passing 0 instead of NULL as the rsp argument to
> mt7530_fdb_cmd(), fix that.
> 

Acked-by: Sean Wang <sean.wang-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org>

BTW, does the part of the commit message should be updated with "passing
NULL instead of 0"?


> Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch")
> Signed-off-by: Florian Fainelli <f.fainelli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> ---
>  drivers/net/dsa/mt7530.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c
> index 4e53c5ce23ff..a7f6c0a62fc7 100644
> --- a/drivers/net/dsa/mt7530.c
> +++ b/drivers/net/dsa/mt7530.c
> @@ -917,7 +917,7 @@ mt7530_port_fdb_add(struct dsa_switch *ds, int port,
>  
>  	mutex_lock(&priv->reg_mutex);
>  	mt7530_fdb_write(priv, vid, port_mask, addr, -1, STATIC_ENT);
> -	ret = mt7530_fdb_cmd(priv, MT7530_FDB_WRITE, 0);
> +	ret = mt7530_fdb_cmd(priv, MT7530_FDB_WRITE, NULL);
>  	mutex_unlock(&priv->reg_mutex);
>  
>  	return ret;
> @@ -933,7 +933,7 @@ mt7530_port_fdb_del(struct dsa_switch *ds, int port,
>  
>  	mutex_lock(&priv->reg_mutex);
>  	mt7530_fdb_write(priv, vid, port_mask, addr, -1, STATIC_EMP);
> -	ret = mt7530_fdb_cmd(priv, MT7530_FDB_WRITE, 0);
> +	ret = mt7530_fdb_cmd(priv, MT7530_FDB_WRITE, NULL);
>  	mutex_unlock(&priv->reg_mutex);
>  
>  	return ret;
> @@ -1293,7 +1293,7 @@ mt7530_setup(struct dsa_switch *ds)
>  	}
>  
>  	/* Flush the FDB table */
> -	ret = mt7530_fdb_cmd(priv, MT7530_FDB_FLUSH, 0);
> +	ret = mt7530_fdb_cmd(priv, MT7530_FDB_FLUSH, NULL);
>  	if (ret < 0)
>  		return ret;
>  

^ permalink raw reply

* Re: [PATCH 0/2] rhashtable_walk fixes
From: NeilBrown @ 2018-04-03  2:23 UTC (permalink / raw)
  To: David Miller; +Cc: tgraf, herbert, netdev, linux-kernel
In-Reply-To: <20180330.101826.1844442556880257787.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 1731 bytes --]

On Fri, Mar 30 2018, David Miller wrote:

> From: NeilBrown <neilb@suse.com>
> Date: Thu, 29 Mar 2018 12:19:09 +1100
>
>> These two patches apply on top of my previous "rhashtable: reset iter
>> when rhashtable_walk_start sees new table" patch.
>> 
>> The first fixes a bug that I found in rhltable_insert().
>> 
>> The second is an alternate to my "rhashtable: allow a walk of the hash
>> table without missing object."
>> This version doesn't require an API change and should be reliable for
>> rhltables too (my first version didn't handle these correctly).
>
> Neil, please don't mix and match patches.
>
> Also when you need to change a patch in a series, please post the entire
> new series not just the patch that changes.
>
> Patch #1 in this series is unnecessary.  As Herbert explained this has
> been fixed already.
>
> So please repost freshly the patches that are relevant and you want me
> to consider for inclusion.  Also be explicit and clear about which of
> my two networking trees you are targetting these changes.

Hi Dave,
 I'm sorry if I've caused some confusion, but I didn't think that I was
 submitting patches to you and know nothing about your two trees.
 I was submitting patches to Thomas and Herbert, the registered
 maintainers of rhashtable.  I assumed they would review, respond, and
 take responsibility for getting them upstream, if that's what they
 decided, based on whatever arrangements they have in place.

 If it is appropriate I can resend all of my patches that receive an
 Ack as a coherent series, and send this to you nominating a particular
 tree, but I'm unlikely to do that unless asked and told which tree to
 nominate.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply

* Re: [PATCH 0/2] rhashtable_walk fixes
From: David Miller @ 2018-04-03  2:34 UTC (permalink / raw)
  To: neilb; +Cc: tgraf, herbert, netdev, linux-kernel
In-Reply-To: <8760591103.fsf@notabene.neil.brown.name>

From: NeilBrown <neilb@suse.com>
Date: Tue, 03 Apr 2018 12:23:40 +1000

>  I'm sorry if I've caused some confusion, but I didn't think that I was
>  submitting patches to you and know nothing about your two trees.
>  I was submitting patches to Thomas and Herbert, the registered
>  maintainers of rhashtable.  I assumed they would review, respond, and
>  take responsibility for getting them upstream, if that's what they
>  decided, based on whatever arrangements they have in place.
> 
>  If it is appropriate I can resend all of my patches that receive an
>  Ack as a coherent series, and send this to you nominating a particular
>  tree, but I'm unlikely to do that unless asked and told which tree to
>  nominate.

Herbert and Thomas generally review rhashtable patches, but rhashtable
itself is generally maintained in the networking tree(s).  So once
they review and ACK it, I would apply it.

^ permalink raw reply

* Re: [PATCH v8 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
From: Sinan Kaya @ 2018-04-03  2:59 UTC (permalink / raw)
  To: jeffrey.t.kirsher, Alexander Duyck
  Cc: netdev, Timur Tabi, sulrich, linux-arm-msm, linux-arm-kernel
In-Reply-To: <1522695990-31082-1-git-send-email-okaya@codeaurora.org>

Alex,

On 4/2/2018 3:06 PM, Sinan Kaya wrote:
> Code includes wmb() followed by writel() in multiple places. writel()
> already has a barrier on some architectures like arm64.
> 
> This ends up CPU observing two barriers back to back before executing the
> register write.
> 
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
> 
> I did a regex search for wmb() followed by writel() in each drivers
> directory. I scrubbed the ones I care about in this series.
> 
> I considered "ease of change", "popular usage" and "performance critical
> path" as the determining criteria for my filtering.
> 
> We used relaxed API heavily on ARM for a long time but
> it did not exist on other architectures. For this reason, relaxed
> architectures have been paying double penalty in order to use the common
> drivers.
> 
> Now that relaxed API is present on all architectures, we can go and scrub
> all drivers to see what needs to change and what can remain.
> 
> We start with mostly used ones and hope to increase the coverage over time.
> It will take a while to cover all drivers.
> 
> Feel free to apply patches individually.
> 
> Change since 7:
> 
> * API clarification with Linus and several architecture folks regarding
> writel()
> 
> https://www.mail-archive.com/netdev@vger.kernel.org/msg225806.html
> 
> * removed wmb() in front of writel() at several places.
> * remove old IA64 comments regarding ordering.
> 

What do you think about this version? Did I miss any SMP barriers?


> Sinan Kaya (7):
>   i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
>   ixgbe: eliminate duplicate barriers on weakly-ordered archs
>   igbvf: eliminate duplicate barriers on weakly-ordered archs
>   igb: eliminate duplicate barriers on weakly-ordered archs
>   fm10k: Eliminate duplicate barriers on weakly-ordered archs
>   ixgbevf: keep writel() closer to wmb()
>   ixgbevf: eliminate duplicate barriers on weakly-ordered archs
> 
>  drivers/net/ethernet/intel/fm10k/fm10k_main.c     | 15 ++-----------
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c       | 22 ++++++++----------
>  drivers/net/ethernet/intel/i40evf/i40e_txrx.c     |  8 +------
>  drivers/net/ethernet/intel/igb/igb_main.c         | 14 ++----------
>  drivers/net/ethernet/intel/igbvf/netdev.c         | 16 +++++---------
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     | 23 ++-----------------
>  drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      |  5 -----
>  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 27 ++++++++---------------
>  8 files changed, 30 insertions(+), 100 deletions(-)
> 

Sinan

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply

* linux-next: manual merge of the net-next tree with the pci tree
From: Stephen Rothwell @ 2018-04-03  3:14 UTC (permalink / raw)
  To: David Miller, Networking, Bjorn Helgaas
  Cc: Linux-Next Mailing List, Linux Kernel Mailing List, Tariq Toukan,
	Saeed Mahameed, Tal Gilboa

[-- Attachment #1: Type: text/plain, Size: 2657 bytes --]

Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

  drivers/net/ethernet/mellanox/mlx5/core/en_main.c

between commit:

  2907938d2375 ("net/mlx5e: Use pcie_bandwidth_available() to compute bandwidth")

from the pci tree and commit:

  0608d4dbaf4e ("net/mlx5e: Unify slow PCI heuristic")

from the net-next tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 884337f88589,0aab3afc6885..000000000000
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@@ -3880,16 -4026,50 +4033,20 @@@ void mlx5e_build_default_indir_rqt(u32 
  		indirection_rqt[i] = i % num_channels;
  }
  
- static bool cqe_compress_heuristic(u32 link_speed, u32 pci_bw)
 -static int mlx5e_get_pci_bw(struct mlx5_core_dev *mdev, u32 *pci_bw)
 -{
 -	enum pcie_link_width width;
 -	enum pci_bus_speed speed;
 -	int err = 0;
 -
 -	err = pcie_get_minimum_link(mdev->pdev, &speed, &width);
 -	if (err)
 -		return err;
 -
 -	if (speed == PCI_SPEED_UNKNOWN || width == PCIE_LNK_WIDTH_UNKNOWN)
 -		return -EINVAL;
 -
 -	switch (speed) {
 -	case PCIE_SPEED_2_5GT:
 -		*pci_bw = 2500 * width;
 -		break;
 -	case PCIE_SPEED_5_0GT:
 -		*pci_bw = 5000 * width;
 -		break;
 -	case PCIE_SPEED_8_0GT:
 -		*pci_bw = 8000 * width;
 -		break;
 -	default:
 -		return -EINVAL;
 -	}
 -
 -	return 0;
 -}
 -
+ static bool slow_pci_heuristic(struct mlx5_core_dev *mdev)
  {
- 	return (link_speed && pci_bw &&
- 		(pci_bw < 40000) && (pci_bw < link_speed));
- }
+ 	u32 link_speed = 0;
+ 	u32 pci_bw = 0;
  
- static bool hw_lro_heuristic(u32 link_speed, u32 pci_bw)
- {
- 	return !(link_speed && pci_bw &&
- 		 (pci_bw <= 16000) && (pci_bw < link_speed));
+ 	mlx5e_get_max_linkspeed(mdev, &link_speed);
 -	mlx5e_get_pci_bw(mdev, &pci_bw);
++	pci_bw = pcie_bandwidth_available(mdev->pdev, NULL, NULL, NULL);
+ 	mlx5_core_dbg_once(mdev, "Max link speed = %d, PCI BW = %d\n",
+ 			   link_speed, pci_bw);
+ 
+ #define MLX5E_SLOW_PCI_RATIO (2)
+ 
+ 	return link_speed && pci_bw &&
+ 		link_speed > MLX5E_SLOW_PCI_RATIO * pci_bw;
  }
  
  void mlx5e_set_tx_cq_mode_params(struct mlx5e_params *params, u8 cq_period_mode)

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH v2 0/2] gfs2: Stop using rhashtable_walk_peek
From: NeilBrown @ 2018-04-03  3:41 UTC (permalink / raw)
  To: Herbert Xu, Andreas Gruenbacher
  Cc: cluster-devel, netdev, Tom Herbert, LKML, Thomas Graf
In-Reply-To: <20180329170626.GA24218@gondor.apana.org.au>

[-- Attachment #1: Type: text/plain, Size: 1184 bytes --]

On Fri, Mar 30 2018, Herbert Xu wrote:

> On Thu, Mar 29, 2018 at 06:52:34PM +0200, Andreas Gruenbacher wrote:
>>
>> Should rhashtable_walk_peek be kept around even if there are no more
>> users? I have my doubts.
>
> Absolutely.  All netlink dumps using rhashtable_walk_next are buggy
> and need to switch over to rhashtable_walk_peek.  As otherwise
> the object that triggers the out-of-space condition will be skipped
> upon resumption.

Do we really need a rhashtable_walk_peek() interface?
I imagine that a seqfile ->start function can do:

  if (*ppos == 0 && last_pos != 0) {
  	rhashtable_walk_exit(&iter);
        rhashtable_walk_enter(&table, &iter);
        last_pos = 0;
  }
  rhashtable_walk_start(&iter);
  if (*ppos == last_pos && iter.p)
  	return iter.p;
  last_pos = *ppos;
  return rhashtable_walk_next(&iter)


and the ->next function just does

  last_pos = *ppos;
  *ppos += 1;
  do p = rhashtable_walk_next(&iter); while (IS_ERR(p));
  return p;

It might be OK to have a function call instead of expecting people to
use iter.p directly.

static inline void *rhashtable_walk_prev(struct rhashtable_iter *iter)
{
	return iter->p;
}

Thoughts?

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply

* Re: [PATCH net-next RFC V1 5/5] net: mdio: Add a driver for InES time stamping IP core.
From: Richard Cochran @ 2018-04-03  3:55 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Andrew Lunn, netdev, devicetree, David Miller, Mark Rutland,
	Miroslav Lichvar, Rob Herring, Willem de Bruijn
In-Reply-To: <4c3c939f-4cbe-51db-c141-950b85a5b4de@gmail.com>

On Sun, Mar 25, 2018 at 04:01:49PM -0700, Florian Fainelli wrote:
> The best that I can think about and it still is a hack in some way, is
> to you have your time stamping driver create a proxy mii_bus whose
> purpose is just to hook to mdio/phy_device events (such as link changes)
> in order to do what is necessary, or at least, this would indicate its
> transparent nature towards the MDIO/MDC lines...

That won't work at all, AFAICT.  There is only one mii_bus per netdev,
that is one that is attached to the phydev.
 
> Tangential: the existing PHY time stamping logic should probably be
> generalized to a mdio_device (which the phy_device is a specialized
> superset of) instead of to the phy_device. This would still allow
> existing use cases but it would also allow us to support possible "pure
> MDIO" devices would that become some thing in the future.

So this is exactly what I did.  The time stamping methods were pushed
down into the mdio_device.  The active device (mdiots pointer) points
either to a non-PHY mdio_device or to the mdio_device embedded in the
phydev.

Thanks,
Richard

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox