Netdev List
 help / color / mirror / Atom feed
* RE: [PATCH net-next] r8169: allocate rx buffers using alloc_pages_node
From: Hayes Wang @ 2019-08-08  2:00 UTC (permalink / raw)
  To: Heiner Kallweit, nic_swsd, David Miller; +Cc: netdev@vger.kernel.org
In-Reply-To: <7d79d794-b41c-101f-0720-59eea88bf9ab@gmail.com>

> From: Heiner Kallweit [mailto:hkallweit1@gmail.com]
> Sent: Thursday, August 08, 2019 3:38 AM
> To: nic_swsd; David Miller
> Cc: netdev@vger.kernel.org
> Subject: [PATCH net-next] r8169: allocate rx buffers using alloc_pages_node
> 
> We allocate 16kb per rx buffer, so we can avoid some overhead by using
> alloc_pages_node directly instead of bothering kmalloc_node. Due to
> this change buffers are page-aligned now, therefore the alignment check
> can be removed.
> 
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>

Acked-by: Hayes Wang <hayeswang@realtek.com>


^ permalink raw reply

* Re: [PATCH v2] Fix non-kerneldoc comment in realtek/rtlwifi/usb.c
From: Larry Finger @ 2019-08-08  2:07 UTC (permalink / raw)
  To: Valdis Klētnieks, Ping-Ke Shih, Kalle Valo, David S. Miller
  Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <34195.1565229118@turing-police>

On 8/7/19 8:51 PM, Valdis Klētnieks wrote:
> Fix spurious warning message when building with W=1:
> 
>    CC [M]  drivers/net/wireless/realtek/rtlwifi/usb.o
> drivers/net/wireless/realtek/rtlwifi/usb.c:243: warning: Cannot understand  * on line 243 - I thought it was a doc line
> drivers/net/wireless/realtek/rtlwifi/usb.c:760: warning: Cannot understand  * on line 760 - I thought it was a doc line
> drivers/net/wireless/realtek/rtlwifi/usb.c:790: warning: Cannot understand  * on line 790 - I thought it was a doc line
> 
> Clean up the comment format.
> 
> Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
> 
> ---
> Changes since v1:  Larry Finger pointed out the patch wasn't checkpatch-clean.
> 
> diff --git a/drivers/net/wireless/realtek/rtlwifi/usb.c b/drivers/net/wireless/realtek/rtlwifi/usb.c
> index 34d68dbf4b4c..4b59f3b46b28 100644
> --- a/drivers/net/wireless/realtek/rtlwifi/usb.c
> +++ b/drivers/net/wireless/realtek/rtlwifi/usb.c
> @@ -239,10 +239,7 @@ static void _rtl_usb_io_handler_release(struct ieee80211_hw *hw)
>   	mutex_destroy(&rtlpriv->io.bb_mutex);
>   }
>   
> -/**
> - *
> - *	Default aggregation handler. Do nothing and just return the oldest skb.
> - */
> +/*	Default aggregation handler. Do nothing and just return the oldest skb.  */
>   static struct sk_buff *_none_usb_tx_aggregate_hdl(struct ieee80211_hw *hw,
>   						  struct sk_buff_head *list)
>   {
> @@ -756,11 +753,6 @@ static int rtl_usb_start(struct ieee80211_hw *hw)
>   	return err;
>   }
>   
> -/**
> - *
> - *
> - */
> -
>   /*=======================  tx =========================================*/
>   static void rtl_usb_cleanup(struct ieee80211_hw *hw)
>   {
> @@ -786,11 +778,7 @@ static void rtl_usb_cleanup(struct ieee80211_hw *hw)
>   	usb_kill_anchored_urbs(&rtlusb->tx_submitted);
>   }
>   
> -/**
> - *
> - * We may add some struct into struct rtl_usb later. Do deinit here.
> - *
> - */
> +/* We may add some struct into struct rtl_usb later. Do deinit here.  */
>   static void rtl_usb_deinit(struct ieee80211_hw *hw)
>   {
>   	rtl_usb_cleanup(hw);

I missed that the subject line should be "rtwifi: Fix ....". Otherwise it is OK.

Larry



^ permalink raw reply

* Re: [PATCH 00/34] put_user_pages(): miscellaneous call sites
From: Ira Weiny @ 2019-08-08  2:36 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Jan Kara, John Hubbard, Matthew Wilcox, john.hubbard,
	Andrew Morton, Christoph Hellwig, Dan Williams, Dave Chinner,
	Dave Hansen, Jason Gunthorpe, Jérôme Glisse, LKML,
	amd-gfx, ceph-devel, devel, devel, dri-devel, intel-gfx, kvm,
	linux-arm-kernel, linux-block, linux-crypto, linux-fbdev,
	linux-fsdevel, linux-media, linux-mm, linux-nfs, linux-rdma,
	linux-rpi-kernel, linux-xfs, netdev, rds-devel, sparclinux, x86,
	xen-devel
In-Reply-To: <20190807084649.GQ11812@dhcp22.suse.cz>

On Wed, Aug 07, 2019 at 10:46:49AM +0200, Michal Hocko wrote:
> On Wed 07-08-19 10:37:26, Jan Kara wrote:
> > On Fri 02-08-19 12:14:09, John Hubbard wrote:
> > > On 8/2/19 7:52 AM, Jan Kara wrote:
> > > > On Fri 02-08-19 07:24:43, Matthew Wilcox wrote:
> > > > > On Fri, Aug 02, 2019 at 02:41:46PM +0200, Jan Kara wrote:
> > > > > > On Fri 02-08-19 11:12:44, Michal Hocko wrote:
> > > > > > > On Thu 01-08-19 19:19:31, john.hubbard@gmail.com wrote:
> > > > > > > [...]
> > > > > > > > 2) Convert all of the call sites for get_user_pages*(), to
> > > > > > > > invoke put_user_page*(), instead of put_page(). This involves dozens of
> > > > > > > > call sites, and will take some time.
> > > > > > > 
> > > > > > > How do we make sure this is the case and it will remain the case in the
> > > > > > > future? There must be some automagic to enforce/check that. It is simply
> > > > > > > not manageable to do it every now and then because then 3) will simply
> > > > > > > be never safe.
> > > > > > > 
> > > > > > > Have you considered coccinele or some other scripted way to do the
> > > > > > > transition? I have no idea how to deal with future changes that would
> > > > > > > break the balance though.
> > > 
> > > Hi Michal,
> > > 
> > > Yes, I've thought about it, and coccinelle falls a bit short (it's not smart
> > > enough to know which put_page()'s to convert). However, there is a debug
> > > option planned: a yet-to-be-posted commit [1] uses struct page extensions
> > > (obviously protected by CONFIG_DEBUG_GET_USER_PAGES_REFERENCES) to add
> > > a redundant counter. That allows:
> > > 
> > > void __put_page(struct page *page)
> > > {
> > > 	...
> > > 	/* Someone called put_page() instead of put_user_page() */
> > > 	WARN_ON_ONCE(atomic_read(&page_ext->pin_count) > 0);
> > > 
> > > > > > 
> > > > > > Yeah, that's why I've been suggesting at LSF/MM that we may need to create
> > > > > > a gup wrapper - say vaddr_pin_pages() - and track which sites dropping
> > > > > > references got converted by using this wrapper instead of gup. The
> > > > > > counterpart would then be more logically named as unpin_page() or whatever
> > > > > > instead of put_user_page().  Sure this is not completely foolproof (you can
> > > > > > create new callsite using vaddr_pin_pages() and then just drop refs using
> > > > > > put_page()) but I suppose it would be a high enough barrier for missed
> > > > > > conversions... Thoughts?
> > > 
> > > The debug option above is still a bit simplistic in its implementation
> > > (and maybe not taking full advantage of the data it has), but I think
> > > it's preferable, because it monitors the "core" and WARNs.
> > > 
> > > Instead of the wrapper, I'm thinking: documentation and the passage of
> > > time, plus the debug option (perhaps enhanced--probably once I post it
> > > someone will notice opportunities), yes?
> > 
> > So I think your debug option and my suggested renaming serve a bit
> > different purposes (and thus both make sense). If you do the renaming, you
> > can just grep to see unconverted sites. Also when someone merges new GUP
> > user (unaware of the new rules) while you switch GUP to use pins instead of
> > ordinary references, you'll get compilation error in case of renaming
> > instead of hard to debug refcount leak without the renaming. And such
> > conflict is almost bound to happen given the size of GUP patch set... Also
> > the renaming serves against the "coding inertia" - i.e., GUP is around for
> > ages so people just use it without checking any documentation or comments.
> > After switching how GUP works, what used to be correct isn't anymore so
> > renaming the function serves as a warning that something has really
> > changed.
> 
> Fully agreed!

Ok Prior to this I've been basing all my work for the RDMA/FS DAX stuff in
Johns put_user_pages()...  (Including when I proposed failing truncate with a
lease in June [1])

However, based on the suggestions in that thread it became clear that a new
interface was going to need to be added to pass in the "RDMA file" information
to GUP to associate file pins with the correct processes...

I have many drawings on my white board with "a whole lot of lines" on them to
make sure that if a process opens a file, mmaps it, pins it with RDMA, _closes_
it, and ummaps it; that the resulting file pin can still be traced back to the
RDMA context and all the processes which may have access to it....  No matter
where the original context may have come from.  I believe I have accomplished
that.

Before I go on, I would like to say that the "imbalance" of get_user_pages()
and put_page() bothers me from a purist standpoint...  However, since this
discussion cropped up I went ahead and ported my work to Linus' current master
(5.3-rc3+) and in doing so I only had to steal a bit of Johns code...  Sorry
John...  :-(

I don't have the commit messages all cleaned up and I know there may be some
discussion on these new interfaces but I wanted to throw this series out there
because I think it may be what Jan and Michal are driving at (or at least in
that direction.

Right now only RDMA and DAX FS's are supported.  Other users of GUP will still
fail on a DAX file and regular files will still be at risk.[2]

I've pushed this work (based 5.3-rc3+ (33920f1ec5bf)) here[3]:

https://github.com/weiny2/linux-kernel/tree/linus-rdmafsdax-b0-v3

I think the most relevant patch to this conversation is:

https://github.com/weiny2/linux-kernel/commit/5d377653ba5cf11c3b716f904b057bee6641aaf6

I stole Jans suggestion for a name as the name I used while prototyping was
pretty bad...  So Thanks Jan...  ;-)

Also thanks to John for his contribution on some of this.  I'm still tweaking
put_user_pages under the hood on the DAX path.

Ira

[1] https://lwn.net/Articles/790544/

[2] I've been looking into how to support io_uring next but I've had some issue
getting a test program to actually call GUP in that code path...  :-(

[3] If it would be easier I can just throw an RFC on the list but right now the
cover letter and some of the commit messages are full of the old stuff and
various ideas I have had...

> 
> > Your refcount debug patches are good to catch bugs in the conversions done
> > but that requires you to be able to excercise the code path in the first
> > place which may require particular HW or so, and you also have to enable
> > the debug option which means you already aim at verifying the GUP
> > references are treated properly.
> > 
> > 								Honza
> > 
> > -- 
> > Jan Kara <jack@suse.com>
> > SUSE Labs, CR
> 
> -- 
> Michal Hocko
> SUSE Labs

^ permalink raw reply

* linux-next: manual merge of the bpf-next tree with Linus' tree
From: Stephen Rothwell @ 2019-08-08  2:53 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov, Networking
  Cc: Linux Next Mailing List, Linux Kernel Mailing List,
	Andrii Nakryiko

[-- Attachment #1: Type: text/plain, Size: 2054 bytes --]

Hi all,

Today's linux-next merge of the bpf-next tree got a conflict in:

  tools/lib/bpf/libbpf.c

between commit:

  1d4126c4e119 ("libbpf: sanitize VAR to conservative 1-byte INT")

from Linus' tree and commit:

  b03bc6853c0e ("libbpf: convert libbpf code to use new btf helpers")

from the bpf-next tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc tools/lib/bpf/libbpf.c
index 2b57d7ea7836,3abf2dd1b3b5..000000000000
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@@ -1370,22 -1374,16 +1372,21 @@@ static void bpf_object__sanitize_btf(st
  
  	for (i = 1; i <= btf__get_nr_types(btf); i++) {
  		t = (struct btf_type *)btf__type_by_id(btf, i);
- 		kind = BTF_INFO_KIND(t->info);
  
- 		if (!has_datasec && kind == BTF_KIND_VAR) {
+ 		if (!has_datasec && btf_is_var(t)) {
  			/* replace VAR with INT */
  			t->info = BTF_INFO_ENC(BTF_KIND_INT, 0, 0);
 -			t->size = sizeof(int);
 -			*(int *)(t + 1) = BTF_INT_ENC(0, 0, 32);
 +			/*
 +			 * using size = 1 is the safest choice, 4 will be too
 +			 * big and cause kernel BTF validation failure if
 +			 * original variable took less than 4 bytes
 +			 */
 +			t->size = 1;
- 			*(int *)(t+1) = BTF_INT_ENC(0, 0, 8);
- 		} else if (!has_datasec && kind == BTF_KIND_DATASEC) {
++			*(int *)(t + 1) = BTF_INT_ENC(0, 0, 8);
+ 		} else if (!has_datasec && btf_is_datasec(t)) {
  			/* replace DATASEC with STRUCT */
- 			struct btf_var_secinfo *v = (void *)(t + 1);
- 			struct btf_member *m = (void *)(t + 1);
+ 			const struct btf_var_secinfo *v = btf_var_secinfos(t);
+ 			struct btf_member *m = btf_members(t);
  			struct btf_type *vt;
  			char *name;
  

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH 00/34] put_user_pages(): miscellaneous call sites
From: John Hubbard @ 2019-08-08  3:46 UTC (permalink / raw)
  To: Ira Weiny, Michal Hocko
  Cc: Jan Kara, Matthew Wilcox, Andrew Morton, Christoph Hellwig,
	Dan Williams, Dave Chinner, Dave Hansen, Jason Gunthorpe,
	Jérôme Glisse, LKML, amd-gfx, ceph-devel, devel, devel,
	dri-devel, intel-gfx, kvm, linux-arm-kernel, linux-block,
	linux-crypto, linux-fbdev, linux-fsdevel, linux-media, linux-mm,
	linux-nfs, linux-rdma, linux-rpi-kernel, linux-xfs, netdev,
	rds-devel, sparclinux, x86, xen-devel
In-Reply-To: <20190808023637.GA1508@iweiny-DESK2.sc.intel.com>

On 8/7/19 7:36 PM, Ira Weiny wrote:
> On Wed, Aug 07, 2019 at 10:46:49AM +0200, Michal Hocko wrote:
>> On Wed 07-08-19 10:37:26, Jan Kara wrote:
>>> On Fri 02-08-19 12:14:09, John Hubbard wrote:
>>>> On 8/2/19 7:52 AM, Jan Kara wrote:
>>>>> On Fri 02-08-19 07:24:43, Matthew Wilcox wrote:
>>>>>> On Fri, Aug 02, 2019 at 02:41:46PM +0200, Jan Kara wrote:
>>>>>>> On Fri 02-08-19 11:12:44, Michal Hocko wrote:
>>>>>>>> On Thu 01-08-19 19:19:31, john.hubbard@gmail.com wrote:
  [...]
> Before I go on, I would like to say that the "imbalance" of get_user_pages()
> and put_page() bothers me from a purist standpoint...  However, since this
> discussion cropped up I went ahead and ported my work to Linus' current master
> (5.3-rc3+) and in doing so I only had to steal a bit of Johns code...  Sorry
> John...  :-(
> 
> I don't have the commit messages all cleaned up and I know there may be some
> discussion on these new interfaces but I wanted to throw this series out there
> because I think it may be what Jan and Michal are driving at (or at least in
> that direction.
> 
> Right now only RDMA and DAX FS's are supported.  Other users of GUP will still
> fail on a DAX file and regular files will still be at risk.[2]
> 
> I've pushed this work (based 5.3-rc3+ (33920f1ec5bf)) here[3]:
> 
> https://github.com/weiny2/linux-kernel/tree/linus-rdmafsdax-b0-v3
> 
> I think the most relevant patch to this conversation is:
> 
> https://github.com/weiny2/linux-kernel/commit/5d377653ba5cf11c3b716f904b057bee6641aaf6
> 

ohhh...can you please avoid using the old __put_user_pages_dirty()
function? I thought I'd caught things early enough to get away with
the rename and deletion of that. You could either:

a) open code an implementation of vaddr_put_pages_dirty_lock() that
doesn't call any of the *put_user_pages_dirty*() variants, or

b) include my first patch ("") are part of your series, or

c) base this on Andrews's tree, which already has merged in my first patch.


thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply

* Re: [PATCH v2 1/1] ixgbe: sync the first fragment unconditionally
From: Alexander Duyck @ 2019-08-08  3:48 UTC (permalink / raw)
  To: Jacob Wen
  Cc: Firo Yang, davem@davemloft.net, jeffrey.t.kirsher@intel.com,
	alexander.h.duyck@linux.intel.com,
	intel-wired-lan@lists.osuosl.org, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org
In-Reply-To: <901704f1-163d-9dd8-4d20-93fa19f4435d@oracle.com>

On Wed, Aug 7, 2019 at 6:58 PM Jacob Wen <jian.w.wen@oracle.com> wrote:
>
>
> On 8/7/19 4:38 PM, Firo Yang wrote:
> > The 08/07/2019 15:56, Jacob Wen wrote:
> >> I think the description is not correct. Consider using something like below.
> > Thank you for comments.
> >
> >> In Xen environment, due to memory fragmentation ixgbe may allocate a 'DMA'
> >> buffer with pages that are not physically contiguous.
> > Actually, I didn't look into the reason why ixgbe got a DMA buffer which
> > was mapped to Xen-swiotlb area.
> Yes. I was wrong. You don't need to tell the exact reason.
> >
> > But I don't think this issue relates to phsical memory contiguity because, in
> > our case, one ixgbe_rx_buffer only associates at most one page.
>
> This is interesting.
>
> I guess the performance of the NIC in your environment is not good due
> to the extra cost of bounce buffer.

If I recall correctly the Rx performance for ixgbe shouldn't be too
bad even with a bounce buffer. The cost for map/unmap are expensive
for a bounce buffer setup but the syncs are just copies so they are
pretty cheap in comparison. The driver can take advantage of that on
Rx since it leaves the pages mapped and just syncs the portion of the
pages that are used.

Now if you are hitting the bounce buffer on the Tx side that is
another matter as you have to allocate the buffer on demand and that
is quite expensive.

^ permalink raw reply

* [PATCH v3 1/1] ixgbe: sync the first fragment unconditionally
From: Firo Yang @ 2019-08-08  4:03 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, alexander.duyck@gmail.com,
	alexander.h.duyck@linux.intel.com, jian.w.wen@oracle.com,
	jeffrey.t.kirsher@intel.com, linux-kernel@vger.kernel.org,
	maciejromanfijalkowski@gmail.com,
	intel-wired-lan@lists.osuosl.org, Firo Yang

In Xen environment, if Xen-swiotlb is enabled, ixgbe driver
could possibly allocate a page, DMA memory buffer, for the first
fragment which is not suitable for Xen-swiotlb to do DMA operations.
Xen-swiotlb have to internally allocate another page for doing DMA
operations. This mechanism requires syncing the data from the internal
page to the page which ixgbe sends to upper network stack. However,
since commit f3213d932173 ("ixgbe: Update driver to make use of DMA
attributes in Rx path"), the unmap operation is performed with
DMA_ATTR_SKIP_CPU_SYNC. As a result, the sync is not performed.
Since the sync isn't performed, the upper network stack could receive
a incomplete network packet. By incomplete, it means the linear data
on the first fragment(between skb->head and skb->end) is invalid. So 
we have to copy the data from the internal xen-swiotlb page to the page 
which ixgbe sends to upper network stack through the sync operation.

More details from Alexander Duyck:
Specifically since we are mapping the frame with
DMA_ATTR_SKIP_CPU_SYNC we have to unmap with that as well. As a result
a sync is not performed on an unmap and must be done manually as we
skipped it for the first frag. As such we need to always sync before
possibly performing a page unmap operation.

Fixes: f3213d932173 ("ixgbe: Update driver to make use of DMA
attributes in Rx path")
Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: Firo Yang <firo.yang@suse.com>
---
Changes from v2:
 * Added details on the problem caused by skipping the sync.
 * Added more explanation from Alexander Duyck.

Changes from v1:
 * Imporved the patch description.
 * Added Reviewed-by: and Fixes: as suggested by Alexander Duyck.

 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index cbaf712d6529..200de9838096 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1825,13 +1825,7 @@ static void ixgbe_pull_tail(struct ixgbe_ring *rx_ring,
 static void ixgbe_dma_sync_frag(struct ixgbe_ring *rx_ring,
 				struct sk_buff *skb)
 {
-	/* if the page was released unmap it, else just sync our portion */
-	if (unlikely(IXGBE_CB(skb)->page_released)) {
-		dma_unmap_page_attrs(rx_ring->dev, IXGBE_CB(skb)->dma,
-				     ixgbe_rx_pg_size(rx_ring),
-				     DMA_FROM_DEVICE,
-				     IXGBE_RX_DMA_ATTR);
-	} else if (ring_uses_build_skb(rx_ring)) {
+	if (ring_uses_build_skb(rx_ring)) {
 		unsigned long offset = (unsigned long)(skb->data) & ~PAGE_MASK;
 
 		dma_sync_single_range_for_cpu(rx_ring->dev,
@@ -1848,6 +1842,14 @@ static void ixgbe_dma_sync_frag(struct ixgbe_ring *rx_ring,
 					      skb_frag_size(frag),
 					      DMA_FROM_DEVICE);
 	}
+
+	/* If the page was released, just unmap it. */
+	if (unlikely(IXGBE_CB(skb)->page_released)) {
+		dma_unmap_page_attrs(rx_ring->dev, IXGBE_CB(skb)->dma,
+				     ixgbe_rx_pg_size(rx_ring),
+				     DMA_FROM_DEVICE,
+				     IXGBE_RX_DMA_ATTR);
+	}
 }
 
 /**
-- 
2.16.4


^ permalink raw reply related

* Slow internet? Don't miss out on this WiFi booster
From:  Lester Hardy @ 2019-08-08  4:08 UTC (permalink / raw)
  To: netdev

Internet providers make big money by overcharging you for faster internet lines. Could this little device really be the one solution that we've all been waiting for?

Frequency: 2.4Ghz
Wireless Rate: 300Mbps
Interface: 1 10/100Mbps WAN/LAN RJ45 Ports

Amazing new technology find out here! http://bit.ly/asdf411gt

------------------------
If you'd prefer not to receive future emails, Unsubscribe Here http://bit.ly/itr4fgy
11041 Santa Monica Blvd. #301 Los Angeles, CA 90025


^ permalink raw reply

* Re: [PATCH net-next v4 2/2] net: phy: broadcom: add 1000Base-X support for BCM54616S
From: Tao Ren @ 2019-08-08  4:24 UTC (permalink / raw)
  To: Heiner Kallweit, Andrew Lunn, Florian Fainelli, David S . Miller,
	Arun Parameswaran, Justin Chen, Vladimir Oltean,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	openbmc@lists.ozlabs.org
In-Reply-To: <cfd6e14e-a447-aedb-5bd6-bf65b4b6f98a@gmail.com>

Hi Heiner,

On 8/7/19 12:18 PM, Heiner Kallweit wrote:
> On 06.08.2019 23:42, Tao Ren wrote:
>> Hi Andrew / Heiner / Vladimir,
>>
>> On 8/6/19 2:09 PM, Tao Ren wrote:
>>> The BCM54616S PHY cannot work properly in RGMII->1000Base-KX mode (for
>>> example, on Facebook CMM BMC platform), mainly because genphy functions
>>> are designed for copper links, and 1000Base-X (clause 37) auto negotiation
>>> needs to be handled differently.
>>>
>>> This patch enables 1000Base-X support for BCM54616S by customizing 3
>>> driver callbacks:
>>>
>>>   - probe: probe callback detects PHY's operation mode based on
>>>     INTERF_SEL[1:0] pins and 1000X/100FX selection bit in SerDES 100-FX
>>>     Control register.
>>>
>>>   - config_aneg: bcm54616s_config_aneg_1000bx function is added for auto
>>>     negotiation in 1000Base-X mode.
>>>
>>>   - read_status: BCM54616S and BCM5482 PHY share the same read_status
>>>     callback which manually set link speed and duplex mode in 1000Base-X
>>>     mode.
>>>
>>> Signed-off-by: Tao Ren <taoren@fb.com>
>>
>> I customized config_aneg function for BCM54616S 1000Base-X mode and link-down issue is also fixed: the patch is tested on Facebook CMM and Minipack BMC and everything looks normal. Please kindly review when you have bandwidth and let me know if you have further suggestions.
>>
>> BTW, I would be happy to help if we decide to add a set of genphy functions for clause 37, although that may mean I need more help/guidance from you :-)
> 
> You want to have standard clause 37 aneg and this should be generic in phylib.
> I hacked together a first version that is compile-tested only:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.ozlabs.org_patch_1143631_&d=DwICaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=iYElT7HC77pRZ3byVvW8ng&m=ZJArOJvHqNkqvs1x8l9HjfxjCN8e5xJpPz2YViBuKRA&s=EskpfBQtu9IBVeb96dv-sz76xIz4tJK5-lD4-qdIyWI&e= 
> It supports fixed mode too.
> 
> It doesn't support half duplex mode because phylib doesn't know 1000BaseX HD yet.
> Not sure whether half duplex mode is used at all in reality.
> 
> You could test the new core functions in your own config_aneg and read_status
> callback implementations.

Thank you very much for the help! I'm planning to add these functions but I haven't started yet because I'm still going through clause 37 :-)

Let me apply your patch and run some test on my platform. Will share you results tomorrow.


Cheers,

Tao

^ permalink raw reply

* Re: [PATCH v2 bpf-next] btf: expose BTF info through sysfs
From: Yonghong Song @ 2019-08-08  4:24 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf@vger.kernel.org, netdev@vger.kernel.org,
	Alexei Starovoitov, daniel@iogearbox.net
  Cc: andrii.nakryiko@gmail.com, Kernel Team, Masahiro Yamada,
	Arnaldo Carvalho de Melo, Jiri Olsa, Sam Ravnborg
In-Reply-To: <20190808003215.1462821-1-andriin@fb.com>



On 8/7/19 5:32 PM, Andrii Nakryiko wrote:
> Make .BTF section allocated and expose its contents through sysfs.
> 
> /sys/kernel/btf directory is created to contain all the BTFs present
> inside kernel. Currently there is only kernel's main BTF, represented as
> /sys/kernel/btf/kernel file. Once kernel modules' BTFs are supported,
> each module will expose its BTF as /sys/kernel/btf/<module-name> file.
> 
> Current approach relies on a few pieces coming together:
> 1. pahole is used to take almost final vmlinux image (modulo .BTF and
>     kallsyms) and generate .BTF section by converting DWARF info into
>     BTF. This section is not allocated and not mapped to any segment,
>     though, so is not yet accessible from inside kernel at runtime.
> 2. objcopy dumps .BTF contents into binary file and subsequently
>     convert binary file into linkable object file with automatically
>     generated symbols _binary__btf_kernel_bin_start and
>     _binary__btf_kernel_bin_end, pointing to start and end, respectively,
>     of BTF raw data.
> 3. final vmlinux image is generated by linking this object file (and
>     kallsyms, if necessary). sysfs_btf.c then creates
>     /sys/kernel/btf/kernel file and exposes embedded BTF contents through
>     it. This allows, e.g., libbpf and bpftool access BTF info at
>     well-known location, without resorting to searching for vmlinux image
>     on disk (location of which is not standardized and vmlinux image
>     might not be even available in some scenarios, e.g., inside qemu
>     during testing).
> 
> Alternative approach using .incbin assembler directive to embed BTF
> contents directly was attempted but didn't work, because sysfs_proc.o is
> not re-compiled during link-vmlinux.sh stage. This is required, though,
> to update embedded BTF data (initially empty data is embedded, then
> pahole generates BTF info and we need to regenerate sysfs_btf.o with
> updated contents, but it's too late at that point).
> 
> If BTF couldn't be generated due to missing or too old pahole,
> sysfs_btf.c handles that gracefully by detecting that
> _binary__btf_kernel_bin_start (weak symbol) is 0 and not creating
> /sys/kernel/btf at all.
> 
> v1->v2:
> - allow kallsyms stage to re-use vmlinux generated by gen_btf();
> 
> Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Jiri Olsa <jolsa@kernel.org>
> Cc: Sam Ravnborg <sam@ravnborg.org>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ---
>   kernel/bpf/Makefile     |  3 +++
>   kernel/bpf/sysfs_btf.c  | 52 ++++++++++++++++++++++++++++++++++++++
>   scripts/link-vmlinux.sh | 55 +++++++++++++++++++++++++++--------------
>   3 files changed, 91 insertions(+), 19 deletions(-)
>   create mode 100644 kernel/bpf/sysfs_btf.c
> 
> diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
> index 29d781061cd5..e1d9adb212f9 100644
> --- a/kernel/bpf/Makefile
> +++ b/kernel/bpf/Makefile
> @@ -22,3 +22,6 @@ obj-$(CONFIG_CGROUP_BPF) += cgroup.o
>   ifeq ($(CONFIG_INET),y)
>   obj-$(CONFIG_BPF_SYSCALL) += reuseport_array.o
>   endif
> +ifeq ($(CONFIG_SYSFS),y)
> +obj-$(CONFIG_DEBUG_INFO_BTF) += sysfs_btf.o
> +endif
> diff --git a/kernel/bpf/sysfs_btf.c b/kernel/bpf/sysfs_btf.c
> new file mode 100644
> index 000000000000..ac06ce1d62e8
> --- /dev/null
> +++ b/kernel/bpf/sysfs_btf.c
> @@ -0,0 +1,52 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Provide kernel BTF information for introspection and use by eBPF tools.
> + */
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/kobject.h>
> +#include <linux/init.h>
> +
> +/* See scripts/link-vmlinux.sh, gen_btf() func for details */
> +extern char __weak _binary__btf_kernel_bin_start[];
> +extern char __weak _binary__btf_kernel_bin_end[];
> +
> +static ssize_t
> +btf_kernel_read(struct file *file, struct kobject *kobj,
> +		struct bin_attribute *bin_attr,
> +		char *buf, loff_t off, size_t len)
> +{
> +	memcpy(buf, _binary__btf_kernel_bin_start + off, len);
> +	return len;
> +}
> +
> +static struct bin_attribute btf_kernel_attr __ro_after_init = {
> +	.attr = {
> +		.name = "kernel",
> +		.mode = 0444,
> +	},
> +	.read = btf_kernel_read,
> +};
> +
> +static struct bin_attribute *btf_attrs[] __ro_after_init = {
> +	&btf_kernel_attr,
> +	NULL,
> +};
> +
> +static struct attribute_group btf_group_attr __ro_after_init = {
> +	.name = "btf",
> +	.bin_attrs = btf_attrs,
> +};
> +
> +static int __init btf_kernel_init(void)
> +{
> +	if (!_binary__btf_kernel_bin_start)
> +		return 0;
> +
> +	btf_kernel_attr.size = _binary__btf_kernel_bin_end -
> +			       _binary__btf_kernel_bin_start;
> +
> +	return sysfs_create_group(kernel_kobj, &btf_group_attr);
> +}
> +
> +subsys_initcall(btf_kernel_init);
> diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
> index a7124f895b24..e05abe19b11f 100755
> --- a/scripts/link-vmlinux.sh
> +++ b/scripts/link-vmlinux.sh
> @@ -56,8 +56,8 @@ modpost_link()
>   }
>   
>   # Link of vmlinux
> -# ${1} - optional extra .o files
> -# ${2} - output file
> +# ${1} - output file
> +# ${@:2} - optional extra .o files
>   vmlinux_link()
>   {
>   	local lds="${objtree}/${KBUILD_LDS}"
> @@ -70,9 +70,9 @@ vmlinux_link()
>   			--start-group				\
>   			${KBUILD_VMLINUX_LIBS}			\
>   			--end-group				\
> -			${1}"
> +			${@:2}"
>   
> -		${LD} ${KBUILD_LDFLAGS} ${LDFLAGS_vmlinux} -o ${2}	\
> +		${LD} ${KBUILD_LDFLAGS} ${LDFLAGS_vmlinux} -o ${1}	\
>   			-T ${lds} ${objects}
>   	else
>   		objects="-Wl,--whole-archive			\
> @@ -81,9 +81,9 @@ vmlinux_link()
>   			-Wl,--start-group			\
>   			${KBUILD_VMLINUX_LIBS}			\
>   			-Wl,--end-group				\
> -			${1}"
> +			${@:2}"
>   
> -		${CC} ${CFLAGS_vmlinux} -o ${2}			\
> +		${CC} ${CFLAGS_vmlinux} -o ${1}			\
>   			-Wl,-T,${lds}				\
>   			${objects}				\
>   			-lutil -lrt -lpthread
> @@ -92,23 +92,34 @@ vmlinux_link()
>   }
>   
>   # generate .BTF typeinfo from DWARF debuginfo
> +# ${1} - vmlinux image
> +# ${2} - file to dump raw BTF data into
>   gen_btf()
>   {
> -	local pahole_ver;
> +	local pahole_ver
> +	local bin_arch
>   
>   	if ! [ -x "$(command -v ${PAHOLE})" ]; then
>   		info "BTF" "${1}: pahole (${PAHOLE}) is not available"
> -		return 0
> +		return 1
>   	fi
>   
>   	pahole_ver=$(${PAHOLE} --version | sed -E 's/v([0-9]+)\.([0-9]+)/\1\2/')
>   	if [ "${pahole_ver}" -lt "113" ]; then
>   		info "BTF" "${1}: pahole version $(${PAHOLE} --version) is too old, need at least v1.13"
> -		return 0
> +		return 1
>   	fi
>   
> -	info "BTF" ${1}
> +	info "BTF" ${2}
> +	vmlinux_link ${1}
>   	LLVM_OBJCOPY=${OBJCOPY} ${PAHOLE} -J ${1}
> +
> +	# dump .BTF section into raw binary file to link with final vmlinux
> +	bin_arch=$(${OBJDUMP} -f ${1} | grep architecture | \
> +		cut -d, -f1 | cut -d' ' -f2)
> +	${OBJCOPY} --dump-section .BTF=.btf.kernel.bin ${1} 2>/dev/null
> +	${OBJCOPY} -I binary -O ${CONFIG_OUTPUT_FORMAT} -B ${bin_arch} \
> +		--rename-section .data=.BTF .btf.kernel.bin ${2}

Currently, the binary size on my config is about 2.6MB. Do you think
we could or need to compress it to make it smaller? I tried gzip
and the compressed size is 0.9MB.

>   }
>   
>   # Create ${2} .o file with all symbols from the ${1} object file
> @@ -153,6 +164,7 @@ sortextable()
>   # Delete output files in case of error
>   cleanup()
>   {
> +	rm -f .btf.*
>   	rm -f .tmp_System.map
>   	rm -f .tmp_kallsyms*
>   	rm -f .tmp_vmlinux*
> @@ -215,6 +227,13 @@ ${MAKE} -f "${srctree}/scripts/Makefile.modpost" vmlinux.o
>   info MODINFO modules.builtin.modinfo
>   ${OBJCOPY} -j .modinfo -O binary vmlinux.o modules.builtin.modinfo
>   
> +btf_kernel_bin_o=""
> +if [ -n "${CONFIG_DEBUG_INFO_BTF}" ]; then
> +	if gen_btf .tmp_vmlinux1 .btf.kernel.bin.o ; then
> +		btf_kernel_bin_o=.btf.kernel.bin.o
> +	fi
> +fi
> +
>   kallsymso=""
>   kallsyms_vmlinux=""
>   if [ -n "${CONFIG_KALLSYMS}" ]; then
> @@ -246,11 +265,14 @@ if [ -n "${CONFIG_KALLSYMS}" ]; then
>   	kallsyms_vmlinux=.tmp_vmlinux2
>   
>   	# step 1
> -	vmlinux_link "" .tmp_vmlinux1
> +	# skip building .tmp_vmlinux1 if gen_btf() already did that
> +	if [ -z "${btf_kernel_bin_o}" ]; then
> +		vmlinux_link .tmp_vmlinux1
> +	fi
>   	kallsyms .tmp_vmlinux1 .tmp_kallsyms1.o
>   
>   	# step 2
> -	vmlinux_link .tmp_kallsyms1.o .tmp_vmlinux2
> +	vmlinux_link .tmp_vmlinux2 .tmp_kallsyms1.o ${btf_kernel_bin_o}
>   	kallsyms .tmp_vmlinux2 .tmp_kallsyms2.o
>   
>   	# step 3
> @@ -261,18 +283,13 @@ if [ -n "${CONFIG_KALLSYMS}" ]; then
>   		kallsymso=.tmp_kallsyms3.o
>   		kallsyms_vmlinux=.tmp_vmlinux3
>   
> -		vmlinux_link .tmp_kallsyms2.o .tmp_vmlinux3
> -
> +		vmlinux_link .tmp_vmlinux3 .tmp_kallsyms2.o ${btf_kernel_bin_o}
>   		kallsyms .tmp_vmlinux3 .tmp_kallsyms3.o
>   	fi
>   fi
>   
>   info LD vmlinux
> -vmlinux_link "${kallsymso}" vmlinux
> -
> -if [ -n "${CONFIG_DEBUG_INFO_BTF}" ]; then
> -	gen_btf vmlinux
> -fi
> +vmlinux_link vmlinux "${kallsymso}" "${btf_kernel_bin_o}"
>   
>   if [ -n "${CONFIG_BUILDTIME_EXTABLE_SORT}" ]; then
>   	info SORTEX vmlinux
> 

^ permalink raw reply

* Re: [patch net-next rfc 2/7] net: introduce name_node struct to be used in hashlist
From: kbuild test robot @ 2019-08-08  4:34 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: kbuild-all, netdev, davem, jakub.kicinski, sthemmin, dsahern,
	dcbw, mkubecek, andrew, parav, saeedm, mlxsw
In-Reply-To: <20190719110029.29466-3-jiri@resnulli.us>

[-- Attachment #1: Type: text/plain, Size: 1138 bytes --]

Hi Jiri,

I love your patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Jiri-Pirko/net-introduce-alternative-names-for-network-interfaces/20190720-094639
config: arc-defconfig (attached as .config)
compiler: arc-elf-gcc (GCC) 7.4.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.4.0 make.cross ARCH=arc 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   {standard input}: Assembler messages:
   {standard input}:147: Error: inappropriate arguments for opcode 'st'
>> {standard input}:150: Error: Insn j_s has a jump/branch instruction j_s in its delay slot.
   {standard input}:175: Error: inappropriate arguments for opcode 'st'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 8655 bytes --]

^ permalink raw reply

* Re: [PATCH v5 bpf-next] BPF: helpers: New helper to obtain namespace data from current task
From: Yonghong Song @ 2019-08-08  4:44 UTC (permalink / raw)
  To: Carlos Antonio Neira Bustos, netdev@vger.kernel.org
  Cc: ebiederm@xmission.com, brouer@redhat.com,
	quentin.monnet@netronome.com
In-Reply-To: <20190808012240.htbgpv2mhktvig5h@dev00>



On 8/7/19 6:22 PM, Carlos Antonio Neira Bustos wrote:
> The code has been modified to avoid syscalls that could sleep.
> Please let me know if any other modification is needed.
> 
>  From be0384c0fa209a78c1567936e8db4e35b9a7c0f8 Mon Sep 17 00:00:00 2001
> From: Carlos <cneirabustos@gmail.com>
> Date: Wed, 7 Aug 2019 20:04:30 -0400
> Subject: [PATCH] [PATCH v5 bpf-next] BPF: New helper to obtain namespace data
>   from current task
> 
> This helper obtains the active namespace from current and returns pid, tgid,
> device and namespace id as seen from that namespace, allowing to instrument
> a process inside a container.
> Device is read from /proc/self/ns/pid, as in the future it's possible that
> different pid_ns files may belong to different devices, according
> to the discussion between Eric Biederman and Yonghong in 2017 linux plumbers
> conference.
> Currently bpf_get_current_pid_tgid(), is used to do pid filtering in bcc's
> scripts but this helper returns the pid as seen by the root namespace which is
> fine when a bcc script is not executed inside a container.
> When the process of interest is inside a container, pid filtering will not work
> if bpf_get_current_pid_tgid() is used. This helper addresses this limitation
> returning the pid as it's seen by the current namespace where the script is
> executing.
> 
> This helper has the same use cases as bpf_get_current_pid_tgid() as it can be
> used to do pid filtering even inside a container.
> 
> For example a bcc script using bpf_get_current_pid_tgid() (tools/funccount.py):
> 
>          u32 pid = bpf_get_current_pid_tgid() >> 32;
>          if (pid != <pid_arg_passed_in>)
>                  return 0;
> Could be modified to use bpf_get_current_pidns_info() as follows:
> 
>          struct bpf_pidns pidns;
>          bpf_get_current_pidns_info(&pidns, sizeof(struct bpf_pidns));
>          u32 pid = pidns.tgid;
>          u32 nsid = pidns.nsid;
>          if ((pid != <pid_arg_passed_in>) && (nsid != <nsid_arg_passed_in>))
>                  return 0;
> 
> To find out the name PID namespace id of a process, you could use this command:
> 
> $ ps -h -o pidns -p <pid_of_interest>
> 
> Or this other command:
> 
> $ ls -Li /proc/<pid_of_interest>/ns/pid
> 
> Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
> ---
>   fs/namei.c                                         |   2 +-
>   include/linux/bpf.h                                |   1 +
>   include/linux/namei.h                              |   4 +
>   include/uapi/linux/bpf.h                           |  29 ++++-
>   kernel/bpf/core.c                                  |   1 +
>   kernel/bpf/helpers.c                               |  78 ++++++++++++
>   kernel/trace/bpf_trace.c                           |   2 +
>   samples/bpf/Makefile                               |   3 +
>   samples/bpf/trace_ns_info_user.c                   |  35 ++++++
>   samples/bpf/trace_ns_info_user_kern.c              |  44 +++++++
>   tools/include/uapi/linux/bpf.h                     |  29 ++++-
>   tools/testing/selftests/bpf/Makefile               |   2 +-
>   tools/testing/selftests/bpf/bpf_helpers.h          |   3 +
>   .../testing/selftests/bpf/progs/test_pidns_kern.c  |  51 ++++++++
>   tools/testing/selftests/bpf/test_pidns.c           | 138 +++++++++++++++++++++
>   15 files changed, 418 insertions(+), 4 deletions(-)
>   create mode 100644 samples/bpf/trace_ns_info_user.c
>   create mode 100644 samples/bpf/trace_ns_info_user_kern.c
>   create mode 100644 tools/testing/selftests/bpf/progs/test_pidns_kern.c
>   create mode 100644 tools/testing/selftests/bpf/test_pidns.c

Carlos,

Thanks for your patch. Maybe the following changes can be made
for easier review and eventual merging if accepted.

(1). The single patch should be broken into multiple patches:
      #1. fs/namei.c include/linux/namei.h
      #2. bpf kernel changes in include/linux/bpf.h,
          include/uapi/linux/bpf.h, kernel/bpf/core.c
          kernel/bpf/helpers.h kernel/trace/bpf_trace.c
      #3. sync uapi header tools/include/uapi/linux/bpf.h
      #4. tools/testing/selftests/bpf changes.
      #5. samples/bpf/ changes.
(2). Since you have more than one patch, please do provide
      a cover letter to describe use case as you did in the
      above commit. Please put the change logs after
      the original commit message.
      You can look at some other examples
      "git log" in the net-next or bpf-next repo.
(3). Please send to or cc BPF maintainers:
      Alexei Starovoitov <ast@kernel.org>
      Daniel Borkmann <daniel@iogearbox.net>

> 
> diff --git a/fs/namei.c b/fs/namei.c
> index 209c51a5226c..d1eca36972d2 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -19,7 +19,6 @@
>   #include <linux/export.h>
>   #include <linux/kernel.h>
>   #include <linux/slab.h>
> -#include <linux/fs.h>
>   #include <linux/namei.h>
>   #include <linux/pagemap.h>
>   #include <linux/fsnotify.h>
> @@ -2355,6 +2354,7 @@ int filename_lookup(int dfd, struct filename *name, unsigned flags,
>   	putname(name);
>   	return retval;
>   }
> +EXPORT_SYMBOL(filename_lookup);
>   
>   /* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
>   static int path_parentat(struct nameidata *nd, unsigned flags,
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index f9a506147c8a..e4adf5e05afd 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1050,6 +1050,7 @@ extern const struct bpf_func_proto bpf_get_local_storage_proto;
>   extern const struct bpf_func_proto bpf_strtol_proto;
>   extern const struct bpf_func_proto bpf_strtoul_proto;
>   extern const struct bpf_func_proto bpf_tcp_sock_proto;
> +extern const struct bpf_func_proto bpf_get_current_pidns_info_proto;
>   
>   /* Shared helpers among cBPF and eBPF. */
>   void bpf_user_rnd_init_once(void);
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index 9138b4471dbf..2c24e8c71d46 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -6,6 +6,7 @@
>   #include <linux/path.h>
>   #include <linux/fcntl.h>
>   #include <linux/errno.h>
> +#include <linux/fs.h>
>   
>   enum { MAX_NESTED_LINKS = 8 };
>   
> @@ -97,6 +98,9 @@ extern void unlock_rename(struct dentry *, struct dentry *);
>   
>   extern void nd_jump_link(struct path *path);
>   
> +extern int filename_lookup(int dfd, struct filename *name, unsigned int flags,
> +		    struct path *path, struct path *root);
> +
>   static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
>   {
>   	((char *) name)[min(len, maxlen)] = '\0';
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 4393bd4b2419..6f601f7106e2 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -2741,6 +2741,26 @@ union bpf_attr {
>    *		**-EOPNOTSUPP** kernel configuration does not enable SYN cookies
>    *
>    *		**-EPROTONOSUPPORT** IP packet version is not 4 or 6
> + *
> + * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, u32 size_of_pidns)
> + *	Description
> + *		Copies into *pidns* pid, namespace id and tgid as seen by the
> + *		current namespace and also device from /proc/self/ns/pid.
> + *		*size_of_pidns* must be the size of *pidns*
> + *
> + *		This helper is used when pid filtering is needed inside a
> + *		container as bpf_get_current_tgid() helper returns always the
> + *		pid id as seen by the root namespace.
> + *	Return
> + *		0 on success
> + *
> + *		**-EINVAL**  if unable to get ns, pid or tgid of current task.
> + *		Or if size_of_pidns is not valid.
> + *
> + *		**-ENOMEM**  if allocation fails.
> + *
> + *		If unable to get the inode from /proc/self/ns/pid an error code
> + *		will be returned.
>    */
>   #define __BPF_FUNC_MAPPER(FN)		\
>   	FN(unspec),			\
> @@ -2853,7 +2873,8 @@ union bpf_attr {
>   	FN(sk_storage_get),		\
>   	FN(sk_storage_delete),		\
>   	FN(send_signal),		\
> -	FN(tcp_gen_syncookie),
> +	FN(tcp_gen_syncookie),		\
> +	FN(get_current_pidns_info),
>   
>   /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>    * function eBPF program intends to call
> @@ -3604,4 +3625,10 @@ struct bpf_sockopt {
>   	__s32	retval;
>   };
>   
> +struct bpf_pidns_info {
> +	__u32 dev;
> +	__u32 nsid;
> +	__u32 tgid;
> +	__u32 pid;
> +};
>   #endif /* _UAPI__LINUX_BPF_H__ */
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 8191a7db2777..3159f2a0188c 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -2038,6 +2038,7 @@ const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
>   const struct bpf_func_proto bpf_get_current_comm_proto __weak;
>   const struct bpf_func_proto bpf_get_current_cgroup_id_proto __weak;
>   const struct bpf_func_proto bpf_get_local_storage_proto __weak;
> +const struct bpf_func_proto bpf_get_current_pidns_info __weak;
>   
>   const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
>   {
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 5e28718928ca..571f24077db2 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -11,6 +11,12 @@
>   #include <linux/uidgid.h>
>   #include <linux/filter.h>
>   #include <linux/ctype.h>
> +#include <linux/pid_namespace.h>
> +#include <linux/major.h>
> +#include <linux/stat.h>
> +#include <linux/namei.h>
> +#include <linux/version.h>
> +
>   
>   #include "../../lib/kstrtox.h"
>   
> @@ -312,6 +318,78 @@ void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
>   	preempt_enable();
>   }
>   
> +BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *, pidns_info, u32,
> +	 size)
> +{
> +	const char *name = "/proc/self/ns/pid";
> +	struct pid_namespace *pidns = NULL;
> +	struct filename *tmp = NULL;
> +	int len = strlen(name) + 1;
> +	struct inode *inode;
> +	struct path kp;
> +	pid_t tgid = 0;
> +	pid_t pid = 0;
> +	int ret;
> +
> +	if (unlikely(size != sizeof(struct bpf_pidns_info)))
> +		return -EINVAL;
> +
> +	pidns = task_active_pid_ns(current);
> +
> +	if (unlikely(!pidns))
> +		goto clear;
> +
> +	pidns_info->nsid =  pidns->ns.inum;
> +	pid = task_pid_nr_ns(current, pidns);
> +
> +	if (unlikely(!pid))
> +		goto clear;
> +
> +	tgid = task_tgid_nr_ns(current, pidns);
> +
> +	if (unlikely(!tgid))
> +		goto clear;
> +
> +	pidns_info->tgid = (u32) tgid;
> +	pidns_info->pid = (u32) pid;
> +
> +	tmp = kmem_cache_alloc(names_cachep, GFP_ATOMIC);
> +	if (unlikely(!tmp)) {
> +		memset((void *)pidns_info, 0, (size_t) size);
> +		return -ENOMEM;
> +	}
> +
> +	memcpy((char *)tmp->name, name, len);
> +	tmp->uptr = NULL;
> +	tmp->aname = NULL;
> +	tmp->refcnt = 1;
> +
> +	ret = filename_lookup(AT_FDCWD, tmp, 0, &kp, NULL);
> +
> +	if (ret) {
> +		memset((void *)pidns_info, 0, (size_t) size);
> +		return ret;
> +	}
> +
> +	inode = d_backing_inode(kp.dentry);
> +	pidns_info->dev = inode->i_sb->s_dev;
> +
> +	return 0;
> +
> +clear:
> +	memset((void *)pidns_info, 0, (size_t) size);
> +
> +	return -EINVAL;
> +}
> +
> +const struct bpf_func_proto bpf_get_current_pidns_info_proto = {
> +	.func	= bpf_get_current_pidns_info,
> +	.gpl_only	= false,
> +	.ret_type	= RET_INTEGER,
> +	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
> +	.arg2_type	= ARG_CONST_SIZE,
> +};
> +
>   #ifdef CONFIG_CGROUPS
>   BPF_CALL_0(bpf_get_current_cgroup_id)
>   {
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index ca1255d14576..5e1dc22765a5 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -709,6 +709,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>   #endif
>   	case BPF_FUNC_send_signal:
>   		return &bpf_send_signal_proto;
> +	case BPF_FUNC_get_current_pidns_info:
> +		return &bpf_get_current_pidns_info_proto;
>   	default:
>   		return NULL;
>   	}
> diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
> index 1d9be26b4edd..238453ff27d2 100644
> --- a/samples/bpf/Makefile
> +++ b/samples/bpf/Makefile
> @@ -53,6 +53,7 @@ hostprogs-y += task_fd_query
>   hostprogs-y += xdp_sample_pkts
>   hostprogs-y += ibumad
>   hostprogs-y += hbm
> +hostprogs-y += trace_ns_info
>   
>   # Libbpf dependencies
>   LIBBPF = $(TOOLS_PATH)/lib/bpf/libbpf.a
> @@ -109,6 +110,7 @@ task_fd_query-objs := bpf_load.o task_fd_query_user.o $(TRACE_HELPERS)
>   xdp_sample_pkts-objs := xdp_sample_pkts_user.o $(TRACE_HELPERS)
>   ibumad-objs := bpf_load.o ibumad_user.o $(TRACE_HELPERS)
>   hbm-objs := bpf_load.o hbm.o $(CGROUP_HELPERS)
> +trace_ns_info-objs := bpf_load.o trace_ns_info_user.o
>   
>   # Tell kbuild to always build the programs
>   always := $(hostprogs-y)
> @@ -170,6 +172,7 @@ always += xdp_sample_pkts_kern.o
>   always += ibumad_kern.o
>   always += hbm_out_kern.o
>   always += hbm_edt_kern.o
> +always += trace_ns_info_user_kern.o
>   
>   KBUILD_HOSTCFLAGS += -I$(objtree)/usr/include
>   KBUILD_HOSTCFLAGS += -I$(srctree)/tools/lib/bpf/
> diff --git a/samples/bpf/trace_ns_info_user.c b/samples/bpf/trace_ns_info_user.c
> new file mode 100644
> index 000000000000..e06d08db6f30
> --- /dev/null
> +++ b/samples/bpf/trace_ns_info_user.c
> @@ -0,0 +1,35 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of version 2 of the GNU General Public
> + * License as published by the Free Software Foundation.
> + */
> +
> +#include <stdio.h>
> +#include <linux/bpf.h>
> +#include <unistd.h>
> +#include "bpf/libbpf.h"
> +#include "bpf_load.h"
> +
> +/* This code was taken verbatim from tracex1_user.c, it's used
> + * to exercize bpf_get_current_pidns_info() helper call.
> + */
> +int main(int ac, char **argv)
> +{
> +	FILE *f;
> +	char filename[256];
> +
> +	snprintf(filename, sizeof(filename), "%s_user_kern.o", argv[0]);
> +	printf("loading %s\n", filename);
> +
> +	if (load_bpf_file(filename)) {
> +		printf("%s", bpf_log_buf);
> +		return 1;
> +	}
> +
> +	f = popen("taskset 1 ping  localhost", "r");
> +	(void) f;
> +	read_trace_pipe();
> +	return 0;
> +}
> diff --git a/samples/bpf/trace_ns_info_user_kern.c b/samples/bpf/trace_ns_info_user_kern.c
> new file mode 100644
> index 000000000000..96675e02b707
> --- /dev/null
> +++ b/samples/bpf/trace_ns_info_user_kern.c
> @@ -0,0 +1,44 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of version 2 of the GNU General Public
> + * License as published by the Free Software Foundation.
> + */
> +#include <linux/skbuff.h>
> +#include <linux/netdevice.h>
> +#include <linux/version.h>
> +#include <uapi/linux/bpf.h>
> +#include "bpf_helpers.h"
> +
> +typedef __u64 u64;
> +typedef __u32 u32;
> +
> +
> +/* kprobe is NOT a stable ABI
> + * kernel functions can be removed, renamed or completely change semantics.
> + * Number of arguments and their positions can change, etc.
> + * In such case this bpf+kprobe example will no longer be meaningful
> + */
> +
> +/* This will call bpf_get_current_pidns_info() to display pid and ns values
> + * as seen by the current namespace, on the far left you will see the pid as
> + * seen as by the root namespace.
> + */
> +
> +SEC("kprobe/__netif_receive_skb_core")
> +int bpf_prog1(struct pt_regs *ctx)
> +{
> +	char fmt[] = "nsid:%u, dev: %u,  pid:%u\n";
> +	struct bpf_pidns_info nsinfo;
> +	int ok = 0;
> +
> +	ok = bpf_get_current_pidns_info(&nsinfo, sizeof(nsinfo));
> +	if (ok == 0)
> +		bpf_trace_printk(fmt, sizeof(fmt), (u32)nsinfo.nsid,
> +				 (u32) nsinfo.dev, (u32)nsinfo.pid);
> +
> +	return 0;
> +}
> +char _license[] SEC("license") = "GPL";
> +u32 _version SEC("version") = LINUX_VERSION_CODE;
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 4393bd4b2419..6f601f7106e2 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -2741,6 +2741,26 @@ union bpf_attr {
>    *		**-EOPNOTSUPP** kernel configuration does not enable SYN cookies
>    *
>    *		**-EPROTONOSUPPORT** IP packet version is not 4 or 6
> + *
> + * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, u32 size_of_pidns)
> + *	Description
> + *		Copies into *pidns* pid, namespace id and tgid as seen by the
> + *		current namespace and also device from /proc/self/ns/pid.
> + *		*size_of_pidns* must be the size of *pidns*
> + *
> + *		This helper is used when pid filtering is needed inside a
> + *		container as bpf_get_current_tgid() helper returns always the
> + *		pid id as seen by the root namespace.
> + *	Return
> + *		0 on success
> + *
> + *		**-EINVAL**  if unable to get ns, pid or tgid of current task.
> + *		Or if size_of_pidns is not valid.
> + *
> + *		**-ENOMEM**  if allocation fails.
> + *
> + *		If unable to get the inode from /proc/self/ns/pid an error code
> + *		will be returned.
>    */
>   #define __BPF_FUNC_MAPPER(FN)		\
>   	FN(unspec),			\
> @@ -2853,7 +2873,8 @@ union bpf_attr {
>   	FN(sk_storage_get),		\
>   	FN(sk_storage_delete),		\
>   	FN(send_signal),		\
> -	FN(tcp_gen_syncookie),
> +	FN(tcp_gen_syncookie),		\
> +	FN(get_current_pidns_info),
>   
>   /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>    * function eBPF program intends to call
> @@ -3604,4 +3625,10 @@ struct bpf_sockopt {
>   	__s32	retval;
>   };
>   
> +struct bpf_pidns_info {
> +	__u32 dev;
> +	__u32 nsid;
> +	__u32 tgid;
> +	__u32 pid;
> +};
>   #endif /* _UAPI__LINUX_BPF_H__ */
> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> index 3bd0f4a0336a..1f97b571b581 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -29,7 +29,7 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test
>   	test_cgroup_storage test_select_reuseport test_section_names \
>   	test_netcnt test_tcpnotify_user test_sock_fields test_sysctl test_hashmap \
>   	test_btf_dump test_cgroup_attach xdping test_sockopt test_sockopt_sk \
> -	test_sockopt_multi test_tcp_rtt
> +	test_sockopt_multi test_tcp_rtt test_pidns
>   
>   BPF_OBJ_FILES = $(patsubst %.c,%.o, $(notdir $(wildcard progs/*.c)))
>   TEST_GEN_FILES = $(BPF_OBJ_FILES)
> diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
> index 120aa86c58d3..c96795a9d983 100644
> --- a/tools/testing/selftests/bpf/bpf_helpers.h
> +++ b/tools/testing/selftests/bpf/bpf_helpers.h
> @@ -231,6 +231,9 @@ static int (*bpf_send_signal)(unsigned sig) = (void *)BPF_FUNC_send_signal;
>   static long long (*bpf_tcp_gen_syncookie)(struct bpf_sock *sk, void *ip,
>   					  int ip_len, void *tcp, int tcp_len) =
>   	(void *) BPF_FUNC_tcp_gen_syncookie;
> +static int (*bpf_get_current_pidns_info)(struct bpf_pidns_info *buf,
> +					 unsigned int buf_size) =
> +	(void *) BPF_FUNC_get_current_pidns_info;
>   
>   /* llvm builtin functions that eBPF C program may use to
>    * emit BPF_LD_ABS and BPF_LD_IND instructions
> diff --git a/tools/testing/selftests/bpf/progs/test_pidns_kern.c b/tools/testing/selftests/bpf/progs/test_pidns_kern.c
> new file mode 100644
> index 000000000000..e1d2facfa762
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/test_pidns_kern.c
> @@ -0,0 +1,51 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of version 2 of the GNU General Public
> + * License as published by the Free Software Foundation.
> + */
> +
> +#include <linux/bpf.h>
> +#include <errno.h>
> +#include "bpf_helpers.h"
> +
> +struct bpf_map_def SEC("maps") nsidmap = {
> +	.type = BPF_MAP_TYPE_ARRAY,
> +	.key_size = sizeof(__u32),
> +	.value_size = sizeof(__u32),
> +	.max_entries = 1,
> +};
> +
> +struct bpf_map_def SEC("maps") pidmap = {
> +	.type = BPF_MAP_TYPE_ARRAY,
> +	.key_size = sizeof(__u32),
> +	.value_size = sizeof(__u32),
> +	.max_entries = 1,
> +};
> +
> +SEC("tracepoint/syscalls/sys_enter_nanosleep")
> +int trace(void *ctx)
> +{
> +	struct bpf_pidns_info nsinfo;
> +	__u32 key = 0, *expected_pid, *val;
> +	char fmt[] = "ERROR nspid:%d\n";
> +
> +	if (bpf_get_current_pidns_info(&nsinfo, sizeof(nsinfo)))
> +		return -EINVAL;
> +
> +	expected_pid = bpf_map_lookup_elem(&pidmap, &key);
> +
> +
> +	if (!expected_pid || *expected_pid != nsinfo.pid)
> +		return 0;
> +
> +	val = bpf_map_lookup_elem(&nsidmap, &key);
> +	if (val)
> +		*val = nsinfo.nsid;
> +
> +	return 0;
> +}
> +
> +char _license[] SEC("license") = "GPL";
> +__u32 _version SEC("version") = 1;
> diff --git a/tools/testing/selftests/bpf/test_pidns.c b/tools/testing/selftests/bpf/test_pidns.c
> new file mode 100644
> index 000000000000..a7254055f294
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/test_pidns.c
> @@ -0,0 +1,138 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of version 2 of the GNU General Public
> + * License as published by the Free Software Foundation.
> + */
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <errno.h>
> +#include <fcntl.h>
> +#include <syscall.h>
> +#include <unistd.h>
> +#include <linux/perf_event.h>
> +#include <sys/ioctl.h>
> +#include <sys/time.h>
> +#include <sys/types.h>
> +#include <sys/stat.h>
> +
> +#include <linux/bpf.h>
> +#include <bpf/bpf.h>
> +#include <bpf/libbpf.h>
> +
> +#include "cgroup_helpers.h"
> +#include "bpf_rlimit.h"
> +
> +#define CHECK(condition, tag, format...) ({		\
> +	int __ret = !!(condition);			\
> +	if (__ret) {					\
> +		printf("%s:FAIL:%s ", __func__, tag);	\
> +		printf(format);				\
> +	} else {					\
> +		printf("%s:PASS:%s\n", __func__, tag);	\
> +	}						\
> +	__ret;						\
> +})
> +
> +static int bpf_find_map(const char *test, struct bpf_object *obj,
> +			const char *name)
> +{
> +	struct bpf_map *map;
> +
> +	map = bpf_object__find_map_by_name(obj, name);
> +	if (!map)
> +		return -1;
> +	return bpf_map__fd(map);
> +}
> +
> +
> +int main(int argc, char **argv)
> +{
> +	const char *probe_name = "syscalls/sys_enter_nanosleep";
> +	const char *file = "test_pidns_kern.o";
> +	int err, bytes, efd, prog_fd, pmu_fd;
> +	int pidmap_fd, nsidmap_fd;
> +	struct perf_event_attr attr = {};
> +	struct bpf_object *obj;
> +	__u32 knsid = 0;
> +	__u32 key = 0, pid;
> +	int exit_code = 1;
> +	struct stat st;
> +	char buf[256];
> +
> +	err = bpf_prog_load(file, BPF_PROG_TYPE_TRACEPOINT, &obj, &prog_fd);
> +	if (CHECK(err, "bpf_prog_load", "err %d errno %d\n", err, errno))
> +		goto cleanup_cgroup_env;
> +
> +	nsidmap_fd = bpf_find_map(__func__, obj, "nsidmap");
> +	if (CHECK(nsidmap_fd < 0, "bpf_find_map", "err %d errno %d\n",
> +		  nsidmap_fd, errno))
> +		goto close_prog;
> +
> +	pidmap_fd = bpf_find_map(__func__, obj, "pidmap");
> +	if (CHECK(pidmap_fd < 0, "bpf_find_map", "err %d errno %d\n",
> +		  pidmap_fd, errno))
> +		goto close_prog;
> +
> +	pid = getpid();
> +	bpf_map_update_elem(pidmap_fd, &key, &pid, 0);
> +
> +	snprintf(buf, sizeof(buf),
> +		 "/sys/kernel/debug/tracing/events/%s/id", probe_name);
> +	efd = open(buf, O_RDONLY, 0);
> +	if (CHECK(efd < 0, "open", "err %d errno %d\n", efd, errno))
> +		goto close_prog;
> +	bytes = read(efd, buf, sizeof(buf));
> +	close(efd);
> +	if (CHECK(bytes <= 0 || bytes >= sizeof(buf), "read",
> +		  "bytes %d errno %d\n", bytes, errno))
> +		goto close_prog;
> +
> +	attr.config = strtol(buf, NULL, 0);
> +	attr.type = PERF_TYPE_TRACEPOINT;
> +	attr.sample_type = PERF_SAMPLE_RAW;
> +	attr.sample_period = 1;
> +	attr.wakeup_events = 1;
> +
> +	pmu_fd = syscall(__NR_perf_event_open, &attr, getpid(), -1, -1, 0);
> +	if (CHECK(pmu_fd < 0, "perf_event_open", "err %d errno %d\n", pmu_fd,
> +		  errno))
> +		goto close_prog;
> +
> +	err = ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0);
> +	if (CHECK(err, "perf_event_ioc_enable", "err %d errno %d\n", err,
> +		  errno))
> +		goto close_pmu;
> +
> +	err = ioctl(pmu_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
> +	if (CHECK(err, "perf_event_ioc_set_bpf", "err %d errno %d\n", err,
> +		  errno))
> +		goto close_pmu;
> +
> +	/* trigger some syscalls */
> +	sleep(1);
> +
> +	err = bpf_map_lookup_elem(nsidmap_fd, &key, &knsid);
> +	if (CHECK(err, "bpf_map_lookup_elem", "err %d errno %d\n", err, errno))
> +		goto close_pmu;
> +
> +	if (stat("/proc/self/ns/pid", &st))
> +		goto close_pmu;
> +
> +	if (CHECK(knsid != (__u32) st.st_ino, "compare_namespace_id",
> +		  "kern knsid %u user unsid %u\n", knsid, (__u32) st.st_ino))
> +		goto close_pmu;
> +
> +	exit_code = 0;
> +	printf("%s:PASS\n", argv[0]);
> +
> +close_pmu:
> +	close(pmu_fd);
> +close_prog:
> +	bpf_object__close(obj);
> +cleanup_cgroup_env:
> +	return exit_code;
> +}
> 

^ permalink raw reply

* Re: [PATCH net-next] net/ncsi: allow to customize BMC MAC Address offset
From: Tao Ren @ 2019-08-08  4:48 UTC (permalink / raw)
  To: Andrew Lunn, Jakub Kicinski
  Cc: Samuel Mendoza-Jonas, David S . Miller, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, openbmc@lists.ozlabs.org,
	William Kennington, Joel Stanley
In-Reply-To: <20190807184143.GE26047@lunn.ch>

On 8/7/19 11:41 AM, Andrew Lunn wrote:
> On Wed, Aug 07, 2019 at 11:25:18AM -0700, Jakub Kicinski wrote:
>> On Tue, 6 Aug 2019 17:21:18 -0700, Tao Ren wrote:
>>> Currently BMC's MAC address is calculated by adding 1 to NCSI NIC's base
>>> MAC address when CONFIG_NCSI_OEM_CMD_GET_MAC option is enabled. The logic
>>> doesn't work for platforms with different BMC MAC offset: for example,
>>> Facebook Yamp BMC's MAC address is calculated by adding 2 to NIC's base
>>> MAC address ("BaseMAC + 1" is reserved for Host use).
>>>
>>> This patch adds NET_NCSI_MC_MAC_OFFSET config option to customize offset
>>> between NIC's Base MAC address and BMC's MAC address. Its default value is
>>> set to 1 to avoid breaking existing users.
>>>
>>> Signed-off-by: Tao Ren <taoren@fb.com>
>>
>> Maybe someone more knowledgeable like Andrew has an opinion here, 
>> but to me it seems a bit strange to encode what seems to be platfrom
>> information in the kernel config :(
> 
> Yes, this is not a good idea. It makes it impossible to have a 'BMC
> distro' kernel which you install on a number of different BMCs.
> 
> A device tree property would be better. Ideally it would be the
> standard local-mac-address, or mac-address.

Thank you Andrew and Jakub for the feedback. I picked up kconfig approach mainly because it's an OEM-only extention which is used only when NCSI_OEM_CMD_GET_MAC is enabled.

Let me prepare patch v2 using device tree. I'm not sure if standard "mac-address" fits this situation because all we need is an offset (integer) and BMC MAC is calculated by adding the offset to NIC's MAC address. Anyways, let me work out v2 patch we can discuss more then.


Thanks,

Tao

^ permalink raw reply

* Re: [PATCH net-next] net/ncsi: allow to customize BMC MAC Address offset
From: Tao Ren @ 2019-08-08  4:51 UTC (permalink / raw)
  To: Vijay Khemka, Samuel Mendoza-Jonas, David S . Miller,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	openbmc@lists.ozlabs.org, William Kennington, Joel Stanley
In-Reply-To: <75DDAF9A-DABC-4670-BEC0-320185017642@fb.com>

On 8/7/19 10:36 AM, Vijay Khemka wrote:
> Lgtm except one small comment below.
> 
> On 8/6/19, 5:22 PM, "openbmc on behalf of Tao Ren" <openbmc-bounces+vijaykhemka=fb.com@lists.ozlabs.org on behalf of taoren@fb.com> wrote:
> 
>     Currently BMC's MAC address is calculated by adding 1 to NCSI NIC's base
>     MAC address when CONFIG_NCSI_OEM_CMD_GET_MAC option is enabled. The logic
>     doesn't work for platforms with different BMC MAC offset: for example,
>     Facebook Yamp BMC's MAC address is calculated by adding 2 to NIC's base
>     MAC address ("BaseMAC + 1" is reserved for Host use).
>     
>     This patch adds NET_NCSI_MC_MAC_OFFSET config option to customize offset
>     between NIC's Base MAC address and BMC's MAC address. Its default value is
>     set to 1 to avoid breaking existing users.
>     
>     Signed-off-by: Tao Ren <taoren@fb.com>
>     ---
>      net/ncsi/Kconfig    |  8 ++++++++
>      net/ncsi/ncsi-rsp.c | 15 +++++++++++++--
>      2 files changed, 21 insertions(+), 2 deletions(-)
>     
>     diff --git a/net/ncsi/Kconfig b/net/ncsi/Kconfig
>     index 2f1e5756c03a..be8efe1ed99e 100644
>     --- a/net/ncsi/Kconfig
>     +++ b/net/ncsi/Kconfig
>     @@ -17,3 +17,11 @@ config NCSI_OEM_CMD_GET_MAC
>      	---help---
>      	  This allows to get MAC address from NCSI firmware and set them back to
>      		controller.
>     +config NET_NCSI_MC_MAC_OFFSET
>     +	int
>     +	prompt "Offset of Management Controller's MAC Address"
>     +	depends on NCSI_OEM_CMD_GET_MAC
>     +	default 1
>     +	help
>     +	  This defines the offset between Network Controller's (base) MAC
>     +	  address and Management Controller's MAC address.
>     diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c
>     index 7581bf919885..24a791f9ebf5 100644
>     --- a/net/ncsi/ncsi-rsp.c
>     +++ b/net/ncsi/ncsi-rsp.c
>     @@ -656,6 +656,11 @@ static int ncsi_rsp_handler_oem_bcm_gma(struct ncsi_request *nr)
>      	struct ncsi_rsp_oem_pkt *rsp;
>      	struct sockaddr saddr;
>      	int ret = 0;
>     +#ifdef CONFIG_NET_NCSI_MC_MAC_OFFSET
>     +	int mac_offset = CONFIG_NET_NCSI_MC_MAC_OFFSET;
>     +#else
>     +	int mac_offset = 1;
>     +#endif
>      
>      	/* Get the response header */
>      	rsp = (struct ncsi_rsp_oem_pkt *)skb_network_header(nr->rsp);
>     @@ -663,8 +668,14 @@ static int ncsi_rsp_handler_oem_bcm_gma(struct ncsi_request *nr)
>      	saddr.sa_family = ndev->type;
>      	ndev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
>      	memcpy(saddr.sa_data, &rsp->data[BCM_MAC_ADDR_OFFSET], ETH_ALEN);
>     -	/* Increase mac address by 1 for BMC's address */
>     -	eth_addr_inc((u8 *)saddr.sa_data);
>     +
>     +	/* Management Controller's MAC address is calculated by adding
>     +	 * the offset to Network Controller's (base) MAC address.
>     +	 * Note: negative offset is "ignored", and BMC will use the Base
> Just mention negative and zero offset is ignored. As you are ignoring 0 as well.

Thank you Vijay for the review.

Zero offset is not ignored: users get what they want when setting offset to 0 (BMC-MAC = Base-MAC).


Thanks,

Tao

^ permalink raw reply

* [PATCH] liquidio: Use pcie_flr() instead of reimplementing it
From: Denis Efremov @ 2019-08-08  4:57 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Denis Efremov, Derek Chickles, Satanand Burla, Felix Manlunas,
	netdev, linux-pci, linux-kernel

octeon_mbox_process_cmd() directly writes the PCI_EXP_DEVCTL_BCR_FLR
bit, which bypasses timing requirements imposed by the PCIe spec.
This patch fixes the function to use the pcie_flr() interface instead.

Signed-off-by: Denis Efremov <efremov@linux.com>
---
 drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c b/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c
index 021d99cd1665..614d07be7181 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c
@@ -260,9 +260,7 @@ static int octeon_mbox_process_cmd(struct octeon_mbox *mbox,
 		dev_info(&oct->pci_dev->dev,
 			 "got a request for FLR from VF that owns DPI ring %u\n",
 			 mbox->q_no);
-		pcie_capability_set_word(
-			oct->sriov_info.dpiring_to_vfpcidev_lut[mbox->q_no],
-			PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_BCR_FLR);
+		pcie_flr(oct->sriov_info.dpiring_to_vfpcidev_lut[mbox->q_no]);
 		break;
 
 	case OCTEON_PF_CHANGED_VF_MACADDR:
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH v5 bpf-next] BPF: helpers: New helper to obtain namespace data from current task
From: Yonghong Song @ 2019-08-08  5:09 UTC (permalink / raw)
  To: Carlos Antonio Neira Bustos, netdev@vger.kernel.org
  Cc: ebiederm@xmission.com, brouer@redhat.com,
	quentin.monnet@netronome.com
In-Reply-To: <20190808012240.htbgpv2mhktvig5h@dev00>



On 8/7/19 6:22 PM, Carlos Antonio Neira Bustos wrote:
> The code has been modified to avoid syscalls that could sleep.
> Please let me know if any other modification is needed.
> 
>  From be0384c0fa209a78c1567936e8db4e35b9a7c0f8 Mon Sep 17 00:00:00 2001
> From: Carlos <cneirabustos@gmail.com>
> Date: Wed, 7 Aug 2019 20:04:30 -0400
> Subject: [PATCH] [PATCH v5 bpf-next] BPF: New helper to obtain namespace data
>   from current task
> 
> This helper obtains the active namespace from current and returns pid, tgid,
> device and namespace id as seen from that namespace, allowing to instrument
> a process inside a container.
> Device is read from /proc/self/ns/pid, as in the future it's possible that
> different pid_ns files may belong to different devices, according
> to the discussion between Eric Biederman and Yonghong in 2017 linux plumbers
> conference.
> Currently bpf_get_current_pid_tgid(), is used to do pid filtering in bcc's
> scripts but this helper returns the pid as seen by the root namespace which is
> fine when a bcc script is not executed inside a container.
> When the process of interest is inside a container, pid filtering will not work
> if bpf_get_current_pid_tgid() is used. This helper addresses this limitation
> returning the pid as it's seen by the current namespace where the script is
> executing.
> 
> This helper has the same use cases as bpf_get_current_pid_tgid() as it can be
> used to do pid filtering even inside a container.
> 
> For example a bcc script using bpf_get_current_pid_tgid() (tools/funccount.py):
> 
>          u32 pid = bpf_get_current_pid_tgid() >> 32;
>          if (pid != <pid_arg_passed_in>)
>                  return 0;
> Could be modified to use bpf_get_current_pidns_info() as follows:
> 
>          struct bpf_pidns pidns;
>          bpf_get_current_pidns_info(&pidns, sizeof(struct bpf_pidns));
>          u32 pid = pidns.tgid;
>          u32 nsid = pidns.nsid;
>          if ((pid != <pid_arg_passed_in>) && (nsid != <nsid_arg_passed_in>))
>                  return 0;
> 
> To find out the name PID namespace id of a process, you could use this command:
> 
> $ ps -h -o pidns -p <pid_of_interest>
> 
> Or this other command:
> 
> $ ls -Li /proc/<pid_of_interest>/ns/pid
> 
> Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
> ---
>   fs/namei.c                                         |   2 +-
>   include/linux/bpf.h                                |   1 +
>   include/linux/namei.h                              |   4 +
>   include/uapi/linux/bpf.h                           |  29 ++++-
>   kernel/bpf/core.c                                  |   1 +
>   kernel/bpf/helpers.c                               |  78 ++++++++++++
>   kernel/trace/bpf_trace.c                           |   2 +
>   samples/bpf/Makefile                               |   3 +
>   samples/bpf/trace_ns_info_user.c                   |  35 ++++++
>   samples/bpf/trace_ns_info_user_kern.c              |  44 +++++++
>   tools/include/uapi/linux/bpf.h                     |  29 ++++-
>   tools/testing/selftests/bpf/Makefile               |   2 +-
>   tools/testing/selftests/bpf/bpf_helpers.h          |   3 +
>   .../testing/selftests/bpf/progs/test_pidns_kern.c  |  51 ++++++++
>   tools/testing/selftests/bpf/test_pidns.c           | 138 +++++++++++++++++++++
>   15 files changed, 418 insertions(+), 4 deletions(-)
>   create mode 100644 samples/bpf/trace_ns_info_user.c
>   create mode 100644 samples/bpf/trace_ns_info_user_kern.c
>   create mode 100644 tools/testing/selftests/bpf/progs/test_pidns_kern.c
>   create mode 100644 tools/testing/selftests/bpf/test_pidns.c
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index 209c51a5226c..d1eca36972d2 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -19,7 +19,6 @@
>   #include <linux/export.h>
>   #include <linux/kernel.h>
>   #include <linux/slab.h>
> -#include <linux/fs.h>
>   #include <linux/namei.h>
>   #include <linux/pagemap.h>
>   #include <linux/fsnotify.h>
> @@ -2355,6 +2354,7 @@ int filename_lookup(int dfd, struct filename *name, unsigned flags,
>   	putname(name);
>   	return retval;
>   }
> +EXPORT_SYMBOL(filename_lookup);

No need to export symbols. bpf uses it and bpf is in the core, not in 
modules.

>   
>   /* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
>   static int path_parentat(struct nameidata *nd, unsigned flags,
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index f9a506147c8a..e4adf5e05afd 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1050,6 +1050,7 @@ extern const struct bpf_func_proto bpf_get_local_storage_proto;
>   extern const struct bpf_func_proto bpf_strtol_proto;
>   extern const struct bpf_func_proto bpf_strtoul_proto;
>   extern const struct bpf_func_proto bpf_tcp_sock_proto;
> +extern const struct bpf_func_proto bpf_get_current_pidns_info_proto;
>   
>   /* Shared helpers among cBPF and eBPF. */
>   void bpf_user_rnd_init_once(void);
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index 9138b4471dbf..2c24e8c71d46 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -6,6 +6,7 @@
>   #include <linux/path.h>
>   #include <linux/fcntl.h>
>   #include <linux/errno.h>
> +#include <linux/fs.h>
>   
>   enum { MAX_NESTED_LINKS = 8 };
>   
> @@ -97,6 +98,9 @@ extern void unlock_rename(struct dentry *, struct dentry *);
>   
>   extern void nd_jump_link(struct path *path);
>   
> +extern int filename_lookup(int dfd, struct filename *name, unsigned int flags,
> +		    struct path *path, struct path *root);

The previous definition in fs/internal.h should be removed.

> +
>   static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
>   {
>   	((char *) name)[min(len, maxlen)] = '\0';
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 4393bd4b2419..6f601f7106e2 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -2741,6 +2741,26 @@ union bpf_attr {
>    *		**-EOPNOTSUPP** kernel configuration does not enable SYN cookies
>    *
>    *		**-EPROTONOSUPPORT** IP packet version is not 4 or 6
> + *
> + * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, u32 size_of_pidns)
> + *	Description
> + *		Copies into *pidns* pid, namespace id and tgid as seen by the
> + *		current namespace and also device from /proc/self/ns/pid.
> + *		*size_of_pidns* must be the size of *pidns*
> + *
> + *		This helper is used when pid filtering is needed inside a
> + *		container as bpf_get_current_tgid() helper returns always the
> + *		pid id as seen by the root namespace.
> + *	Return
> + *		0 on success
> + *
> + *		**-EINVAL**  if unable to get ns, pid or tgid of current task.
> + *		Or if size_of_pidns is not valid.

Maybe reword by following the code sequence.
    if *size_of_pidns* is not valid or unable to get ns, pid or tgid of
    the current task.

> + *
> + *		**-ENOMEM**  if allocation fails.

Maybe some other error codes in filename_lookup() function?

> + *
> + *		If unable to get the inode from /proc/self/ns/pid an error code
> + *		will be returned.

You do not need this. The description of error code cases should cover this.

>    */
>   #define __BPF_FUNC_MAPPER(FN)		\
>   	FN(unspec),			\
> @@ -2853,7 +2873,8 @@ union bpf_attr {
>   	FN(sk_storage_get),		\
>   	FN(sk_storage_delete),		\
>   	FN(send_signal),		\
> -	FN(tcp_gen_syncookie),
> +	FN(tcp_gen_syncookie),		\
> +	FN(get_current_pidns_info),
>   
>   /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>    * function eBPF program intends to call
> @@ -3604,4 +3625,10 @@ struct bpf_sockopt {
>   	__s32	retval;
>   };
>   
> +struct bpf_pidns_info {
> +	__u32 dev;
> +	__u32 nsid;
> +	__u32 tgid;
> +	__u32 pid;
> +};
>   #endif /* _UAPI__LINUX_BPF_H__ */
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 8191a7db2777..3159f2a0188c 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -2038,6 +2038,7 @@ const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
>   const struct bpf_func_proto bpf_get_current_comm_proto __weak;
>   const struct bpf_func_proto bpf_get_current_cgroup_id_proto __weak;
>   const struct bpf_func_proto bpf_get_local_storage_proto __weak;
> +const struct bpf_func_proto bpf_get_current_pidns_info __weak;
>   
>   const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
>   {
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 5e28718928ca..571f24077db2 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -11,6 +11,12 @@
>   #include <linux/uidgid.h>
>   #include <linux/filter.h>
>   #include <linux/ctype.h>
> +#include <linux/pid_namespace.h>
> +#include <linux/major.h>
> +#include <linux/stat.h>
> +#include <linux/namei.h>
> +#include <linux/version.h>
> +
>   
>   #include "../../lib/kstrtox.h"
>   
> @@ -312,6 +318,78 @@ void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
>   	preempt_enable();
>   }
>   
> +BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *, pidns_info, u32,
> +	 size)
> +{
> +	const char *name = "/proc/self/ns/pid";

maybe rename this variable to pidns_path?

> +	struct pid_namespace *pidns = NULL;
> +	struct filename *tmp = NULL;

Maybe rename this variable to name?

> +	int len = strlen(name) + 1;

We can delay this assignment later until it is needed.

> +	struct inode *inode;
> +	struct path kp;
> +	pid_t tgid = 0;
> +	pid_t pid = 0;
> +	int ret;
> +
> +	if (unlikely(size != sizeof(struct bpf_pidns_info)))
> +		return -EINVAL;
> +
> +	pidns = task_active_pid_ns(current);
> +

we can save an empty line here.

> +	if (unlikely(!pidns))
> +		goto clear;
> +
> +	pidns_info->nsid =  pidns->ns.inum;
> +	pid = task_pid_nr_ns(current, pidns);
> +

We can save an empty line here.

> +	if (unlikely(!pid))
> +		goto clear;
> +
> +	tgid = task_tgid_nr_ns(current, pidns);
> +
ditto. save an empty line.
> +	if (unlikely(!tgid))
> +		goto clear;
> +
> +	pidns_info->tgid = (u32) tgid;
> +	pidns_info->pid = (u32) pid;
> +
> +	tmp = kmem_cache_alloc(names_cachep, GFP_ATOMIC);
> +	if (unlikely(!tmp)) {
> +		memset((void *)pidns_info, 0, (size_t) size);
> +		return -ENOMEM;
> +	}
> +
> +	memcpy((char *)tmp->name, name, len);
> +	tmp->uptr = NULL;
> +	tmp->aname = NULL;
> +	tmp->refcnt = 1;
> +
ditto. save an empty line.
> +	ret = filename_lookup(AT_FDCWD, tmp, 0, &kp, NULL);
> +
ditto. save an empty line.
> +	if (ret) {
> +		memset((void *)pidns_info, 0, (size_t) size);
> +		return ret;
> +	}
> +
> +	inode = d_backing_inode(kp.dentry);
> +	pidns_info->dev = inode->i_sb->s_dev;
> +
> +	return 0;
> +
> +clear:
> +	memset((void *)pidns_info, 0, (size_t) size);
> +
save an empty line.
> +	return -EINVAL;
> +}
> +
> +const struct bpf_func_proto bpf_get_current_pidns_info_proto = {
> +	.func	= bpf_get_current_pidns_info,
make the "= " aligned with others?
> +	.gpl_only	= false,
> +	.ret_type	= RET_INTEGER,
> +	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
> +	.arg2_type	= ARG_CONST_SIZE,
> +};
> +
>   #ifdef CONFIG_CGROUPS
>   BPF_CALL_0(bpf_get_current_cgroup_id)
>   {
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index ca1255d14576..5e1dc22765a5 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -709,6 +709,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>   #endif
>   	case BPF_FUNC_send_signal:
>   		return &bpf_send_signal_proto;
> +	case BPF_FUNC_get_current_pidns_info:
> +		return &bpf_get_current_pidns_info_proto;
>   	default:
>   		return NULL;
>   	}
> diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
> index 1d9be26b4edd..238453ff27d2 100644
> --- a/samples/bpf/Makefile
> +++ b/samples/bpf/Makefile
> @@ -53,6 +53,7 @@ hostprogs-y += task_fd_query
>   hostprogs-y += xdp_sample_pkts
>   hostprogs-y += ibumad
>   hostprogs-y += hbm
> +hostprogs-y += trace_ns_info
[...]

^ permalink raw reply

* Re: [PATCH net 1/2] mlxsw: spectrum: Fix error path in mlxsw_sp_module_init()
From: Jesse Brandeburg @ 2019-08-08  5:19 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, davem, jiri, petrm, mlxsw, Ido Schimmel, jesse.brandeburg
In-Reply-To: <20190731063315.9381-2-idosch@idosch.org>

On Wed, 31 Jul 2019 09:33:14 +0300
Ido Schimmel <idosch@idosch.org> wrote:

> From: Jiri Pirko <jiri@mellanox.com>
> 
> In case of sp2 pci driver registration fail, fix the error path to
> start with sp1 pci driver unregister.
> 
> Fixes: c3ab435466d5 ("mlxsw: spectrum: Extend to support Spectrum-2 ASIC")
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>

Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>

^ permalink raw reply

* Re: [PATCH net 2/2] mlxsw: spectrum_buffers: Further reduce pool size on Spectrum-2
From: Jesse Brandeburg @ 2019-08-08  5:19 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, davem, jiri, petrm, mlxsw, Ido Schimmel, jesse.brandeburg
In-Reply-To: <20190731063315.9381-3-idosch@idosch.org>

On Wed, 31 Jul 2019 09:33:15 +0300
Ido Schimmel <idosch@idosch.org> wrote:

> From: Petr Machata <petrm@mellanox.com>
> 
> In commit e891ce1dd2a5 ("mlxsw: spectrum_buffers: Reduce pool size on
> Spectrum-2"), pool size was reduced to mitigate a problem in port buffer
> usage of ports split four ways. It turns out that this work around does not
> solve the issue, and a further reduction is required.
> 
> Thus reduce the size of pool 0 by another 2.7 MiB, and round down to the
> whole number of cells.
> 
> Fixes: e891ce1dd2a5 ("mlxsw: spectrum_buffers: Reduce pool size on Spectrum-2")
> Signed-off-by: Petr Machata <petrm@mellanox.com>
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>

Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>

^ permalink raw reply

* Re: [PATCH] net: ethernet: et131x: Use GFP_KERNEL instead of GFP_ATOMIC when allocating tx_ring->tcb_ring
From: Jesse Brandeburg @ 2019-08-08  5:23 UTC (permalink / raw)
  To: Christophe JAILLET
  Cc: mark.einon, davem, willy, f.fainelli, andrew, netdev,
	linux-kernel, kernel-janitors, jesse.brandeburg
In-Reply-To: <20190731073842.16948-1-christophe.jaillet@wanadoo.fr>

On Wed, 31 Jul 2019 09:38:42 +0200
Christophe JAILLET <christophe.jaillet@wanadoo.fr> wrote:

> There is no good reason to use GFP_ATOMIC here. Other memory allocations
> are performed with GFP_KERNEL (see other 'dma_alloc_coherent()' below and
> 'kzalloc()' in 'et131x_rx_dma_memory_alloc()')
> 
> Use GFP_KERNEL which should be enough.
> 
> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>

Sure, but generally I'd say GFP_ATOMIC is ok if you're in an init path
and you can afford to have the allocation thread sleep while memory is
being found by the kernel.

Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>

^ permalink raw reply

* [PATCH] net/netfilter - add missing prototypes.
From: Valdis Klētnieks @ 2019-08-08  5:28 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal
  Cc: netfilter-devel, coreteam, netdev, linux-kernel

Sparse rightly complains about undeclared symbols.

  CHECK   net/netfilter/nft_set_hash.c
net/netfilter/nft_set_hash.c:647:21: warning: symbol 'nft_set_rhash_type' was not declared. Should it be static?
net/netfilter/nft_set_hash.c:670:21: warning: symbol 'nft_set_hash_type' was not declared. Should it be static?
net/netfilter/nft_set_hash.c:690:21: warning: symbol 'nft_set_hash_fast_type' was not declared. Should it be static?
  CHECK   net/netfilter/nft_set_bitmap.c
net/netfilter/nft_set_bitmap.c:296:21: warning: symbol 'nft_set_bitmap_type' was not declared. Should it be static?
  CHECK   net/netfilter/nft_set_rbtree.c
net/netfilter/nft_set_rbtree.c:470:21: warning: symbol 'nft_set_rbtree_type' was not declared. Should it be static?

Include nf_tables_core.h rather than nf_tables.h to pick up the additional definitions.

Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>

diff --git a/net/netfilter/nft_set_bitmap.c b/net/netfilter/nft_set_bitmap.c
index b5aeccdddb22..087a056e34d1 100644
--- a/net/netfilter/nft_set_bitmap.c
+++ b/net/netfilter/nft_set_bitmap.c
@@ -10,7 +10,7 @@
 #include <linux/netlink.h>
 #include <linux/netfilter.h>
 #include <linux/netfilter/nf_tables.h>
-#include <net/netfilter/nf_tables.h>
+#include <net/netfilter/nf_tables_core.h>
 
 struct nft_bitmap_elem {
 	struct list_head	head;
diff --git a/net/netfilter/nft_set_hash.c b/net/netfilter/nft_set_hash.c
index 6e8d20c03e3d..c490451fcebf 100644
--- a/net/netfilter/nft_set_hash.c
+++ b/net/netfilter/nft_set_hash.c
@@ -16,7 +16,7 @@
 #include <linux/rhashtable.h>
 #include <linux/netfilter.h>
 #include <linux/netfilter/nf_tables.h>
-#include <net/netfilter/nf_tables.h>
+#include <net/netfilter/nf_tables_core.h>
 
 /* We target a hash table size of 4, element hint is 75% of final size */
 #define NFT_RHASH_ELEMENT_HINT 3
diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
index 419d58ef802b..57123259452f 100644
--- a/net/netfilter/nft_set_rbtree.c
+++ b/net/netfilter/nft_set_rbtree.c
@@ -13,7 +13,7 @@
 #include <linux/netlink.h>
 #include <linux/netfilter.h>
 #include <linux/netfilter/nf_tables.h>
-#include <net/netfilter/nf_tables.h>
+#include <net/netfilter/nf_tables_core.h>
 
 struct nft_rbtree {
 	struct rb_root		root;


^ permalink raw reply related

* Re: [PATCH 1/2] net: ag71xx: Slighly simplify code in 'ag71xx_rings_init()'
From: Jesse Brandeburg @ 2019-08-08  5:29 UTC (permalink / raw)
  To: Christophe JAILLET
  Cc: jcliburn, davem, chris.snook, netdev, linux-kernel,
	kernel-janitors, jesse.brandeburg
In-Reply-To: <08fbcfe0f913644fe538656221a15790a1a83f1d.1564560130.git.christophe.jaillet@wanadoo.fr>

On Wed, 31 Jul 2019 10:06:38 +0200
Christophe JAILLET <christophe.jaillet@wanadoo.fr> wrote:

> A few lines above, we have:
>    tx_size = BIT(tx->order);
> 
> So use 'tx_size' directly to be consistent with the way 'rx->descs_cpu' and
> 'rx->descs_dma' are computed below.
> 
> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>

Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>

^ permalink raw reply

* Re: [PATCH 2/2] net: ag71xx: Use GFP_KERNEL instead of GFP_ATOMIC in 'ag71xx_rings_init()'
From: Jesse Brandeburg @ 2019-08-08  5:30 UTC (permalink / raw)
  To: Christophe JAILLET
  Cc: jcliburn, davem, chris.snook, netdev, linux-kernel,
	kernel-janitors, jesse.brandeburg
In-Reply-To: <75ee52ae065ce9cb1f32d88939b166773316dc45.1564560130.git.christophe.jaillet@wanadoo.fr>

On Wed, 31 Jul 2019 10:06:48 +0200
Christophe JAILLET <christophe.jaillet@wanadoo.fr> wrote:

> There is no need to use GFP_ATOMIC here, GFP_KERNEL should be enough.
> The 'kcalloc()' just a few lines above, already uses GFP_KERNEL.
> 
> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>

Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>

^ permalink raw reply

* Re: [PATCH] myri10ge: remove unneeded variable
From: Jesse Brandeburg @ 2019-08-08  5:31 UTC (permalink / raw)
  To: Ding Xiang; +Cc: christopher.lee, davem, netdev, linux-kernel, jesse.brandeburg
In-Reply-To: <1564563226-13367-1-git-send-email-dingxiang@cmss.chinamobile.com>

On Wed, 31 Jul 2019 16:53:46 +0800
Ding Xiang <dingxiang@cmss.chinamobile.com> wrote:

> "error" is unneeded,just return 0
> 
> Signed-off-by: Ding Xiang <dingxiang@cmss.chinamobile.com>

Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>

^ permalink raw reply

* Re: [PATCH v3 38/41] powerpc: convert put_page() to put_user_page*()
From: Michael Ellerman @ 2019-08-08  5:42 UTC (permalink / raw)
  To: john.hubbard, Andrew Morton
  Cc: Christoph Hellwig, Dan Williams, Dave Chinner, Dave Hansen,
	Ira Weiny, Jan Kara, Jason Gunthorpe, Jérôme Glisse,
	LKML, amd-gfx, ceph-devel, devel, devel, dri-devel, intel-gfx,
	kvm, linux-arm-kernel, linux-block, linux-crypto, linux-fbdev,
	linux-fsdevel, linux-media, linux-mm, linux-nfs, linux-rdma,
	linux-rpi-kernel, linux-xfs, netdev, rds-devel, sparclinux, x86,
	xen-devel, John Hubbard, Benjamin Herrenschmidt,
	Christoph Hellwig, linuxppc-dev
In-Reply-To: <20190807013340.9706-39-jhubbard@nvidia.com>

Hi John,

john.hubbard@gmail.com writes:
> diff --git a/arch/powerpc/mm/book3s64/iommu_api.c b/arch/powerpc/mm/book3s64/iommu_api.c
> index b056cae3388b..e126193ba295 100644
> --- a/arch/powerpc/mm/book3s64/iommu_api.c
> +++ b/arch/powerpc/mm/book3s64/iommu_api.c
> @@ -203,6 +202,7 @@ static void mm_iommu_unpin(struct mm_iommu_table_group_mem_t *mem)
>  {
>  	long i;
>  	struct page *page = NULL;
> +	bool dirty = false;

I don't think you need that initialisation do you?

>  	if (!mem->hpas)
>  		return;
> @@ -215,10 +215,9 @@ static void mm_iommu_unpin(struct mm_iommu_table_group_mem_t *mem)
>  		if (!page)
>  			continue;
>  
> -		if (mem->hpas[i] & MM_IOMMU_TABLE_GROUP_PAGE_DIRTY)
> -			SetPageDirty(page);
> +		dirty = mem->hpas[i] & MM_IOMMU_TABLE_GROUP_PAGE_DIRTY;
> -		put_page(page);
> +		put_user_pages_dirty_lock(&page, 1, dirty);
>  		mem->hpas[i] = 0;
>  	}
>  }

cheers

^ permalink raw reply

* [PATCH] net/netfilter/nf_nat_proto.c - make tables static
From: Valdis Klētnieks @ 2019-08-08  5:43 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal
  Cc: netfilter-devel, coreteam, netdev, linux-kernel

Sparse warns about two tables not being declared.

  CHECK   net/netfilter/nf_nat_proto.c
net/netfilter/nf_nat_proto.c:725:26: warning: symbol 'nf_nat_ipv4_ops' was not declared. Should it be static?
net/netfilter/nf_nat_proto.c:964:26: warning: symbol 'nf_nat_ipv6_ops' was not declared. Should it be static?

And in fact they can indeed be static.

Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>

diff --git a/net/netfilter/nf_nat_proto.c b/net/netfilter/nf_nat_proto.c
index 7ac733ebd060..0a59c14b5177 100644
--- a/net/netfilter/nf_nat_proto.c
+++ b/net/netfilter/nf_nat_proto.c
@@ -722,7 +722,7 @@ nf_nat_ipv4_local_fn(void *priv, struct sk_buff *skb,
 	return ret;
 }
 
-const struct nf_hook_ops nf_nat_ipv4_ops[] = {
+static const struct nf_hook_ops nf_nat_ipv4_ops[] = {
 	/* Before packet filtering, change destination */
 	{
 		.hook		= nf_nat_ipv4_in,
@@ -961,7 +961,7 @@ nf_nat_ipv6_local_fn(void *priv, struct sk_buff *skb,
 	return ret;
 }
 
-const struct nf_hook_ops nf_nat_ipv6_ops[] = {
+static const struct nf_hook_ops nf_nat_ipv6_ops[] = {
 	/* Before packet filtering, change destination */
 	{
 		.hook		= nf_nat_ipv6_in,


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox