Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] usbnet: smsc75xx: Merge memcpy + le32_to_cpus to get_unaligned_le32
From: David Miller @ 2019-07-22 19:13 UTC (permalink / raw)
  To: hslester96; +Cc: steve.glendinning, netdev, linux-usb, linux-kernel
In-Reply-To: <20190719082730.6378-1-hslester96@gmail.com>

From: Chuhong Yuan <hslester96@gmail.com>
Date: Fri, 19 Jul 2019 16:27:31 +0800

> Merge the combo use of memcpy and le32_to_cpus.
> Use get_unaligned_le32 instead.
> This simplifies the code.
> 
> Signed-off-by: Chuhong Yuan <hslester96@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH] ax88179_178a: Merge memcpy + le32_to_cpus to get_unaligned_le32
From: David Miller @ 2019-07-22 19:13 UTC (permalink / raw)
  To: hslester96; +Cc: linux-usb, netdev, linux-kernel
In-Reply-To: <20190719090714.19037-1-hslester96@gmail.com>

From: Chuhong Yuan <hslester96@gmail.com>
Date: Fri, 19 Jul 2019 17:07:15 +0800

> Merge the combo use of memcpy and le32_to_cpus.
> Use get_unaligned_le32 instead.
> This simplifies the code.
> 
> Signed-off-by: Chuhong Yuan <hslester96@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net: lan78xx: Merge memcpy + lexx_to_cpus to get_unaligned_lexx
From: David Miller @ 2019-07-22 19:13 UTC (permalink / raw)
  To: hslester96; +Cc: woojung.huh, UNGLinuxDriver, netdev, linux-usb, linux-kernel
In-Reply-To: <20190719073614.1850-1-hslester96@gmail.com>

From: Chuhong Yuan <hslester96@gmail.com>
Date: Fri, 19 Jul 2019 15:36:15 +0800

> Merge the combo use of memcpy and lexx_to_cpus.
> Use get_unaligned_lexx instead.
> This simplifies the code.
> 
> Signed-off-by: Chuhong Yuan <hslester96@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net-ipv6-ndisc: add support for RFC7710 RA Captive Portal Identifier
From: David Miller @ 2019-07-22 19:11 UTC (permalink / raw)
  To: zenczykowski; +Cc: maze, netdev, lorenzo, reminv, raorn
In-Reply-To: <20190719063003.10684-1-zenczykowski@gmail.com>

From: Maciej Żenczykowski <zenczykowski@gmail.com>
Date: Thu, 18 Jul 2019 23:30:03 -0700

> From: Maciej Żenczykowski <maze@google.com>
> 
> This is trivial since we already have support for the entirely
> identical (from the kernel's point of view) RDNSS and DNSSL that
> also contain opaque data that needs to be passed down to userspace.
> 
> As specified in RFC7710, Captive Portal option contains a URL.
> 8-bit identifier of the option type as assigned by the IANA is 37.
> This option should also be treated as userland.
> 
> Hence, treat ND option 37 as userland (Captive Portal support)
> 
> See:
>   https://tools.ietf.org/html/rfc7710
>   https://www.iana.org/assignments/icmpv6-parameters/icmpv6-parameters.xhtml
> 
> Fixes: e35f30c131a56
> Signed-off-by: Maciej Żenczykowski <maze@google.com>

Applied to net-next.

^ permalink raw reply

* Re: [PATCH 1/3] drivers/gpu/drm/via: convert put_page() to put_user_page*()
From: John Hubbard @ 2019-07-22 19:10 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, john.hubbard, Andrew Morton, Alexander Viro,
	Björn Töpel, Boaz Harrosh, Daniel Vetter, Dan Williams,
	Dave Chinner, David Airlie, David S . Miller, Ilya Dryomov,
	Jan Kara, Jason Gunthorpe, Jens Axboe, Jérôme Glisse,
	Johannes Thumshirn, Magnus Karlsson, Miklos Szeredi, Ming Lei,
	Sage Weil, Santosh Shilimkar, Yan Zheng, netdev, dri-devel,
	linux-mm, linux-rdma, bpf, LKML
In-Reply-To: <20190722190722.GF363@bombadil.infradead.org>

On 7/22/19 12:07 PM, Matthew Wilcox wrote:
> On Mon, Jul 22, 2019 at 11:53:54AM -0700, John Hubbard wrote:
>> On 7/22/19 2:33 AM, Christoph Hellwig wrote:
>>> On Sun, Jul 21, 2019 at 09:30:10PM -0700, john.hubbard@gmail.com wrote:
>>>>  		for (i = 0; i < vsg->num_pages; ++i) {
>>>>  			if (NULL != (page = vsg->pages[i])) {
>>>>  				if (!PageReserved(page) && (DMA_FROM_DEVICE == vsg->direction))
>>>> -					SetPageDirty(page);
>>>> -				put_page(page);
>>>> +					put_user_pages_dirty(&page, 1);
>>>> +				else
>>>> +					put_user_page(page);
>>>>  			}
>>>
>>> Can't just pass a dirty argument to put_user_pages?  Also do we really
>>
>> Yes, and in fact that would help a lot more than the single page case,
>> which is really just cosmetic after all.
>>
>>> need a separate put_user_page for the single page case?
>>> put_user_pages_dirty?
>>
>> Not really. I'm still zeroing in on the ideal API for all these call sites,
>> and I agree that the approach below is cleaner.
> 
> so enum { CLEAN = 0, DIRTY = 1, LOCK = 2, DIRTY_LOCK = 3 };
> ?
> 

Sure. In fact, I just applied something similar to bio_release_pages()
locally, in order to reconcile Christoph's and Jerome's approaches 
(they each needed to add a bool arg), so I'm all about the enums in the
arg lists. :)

I'm going to post that one shortly, let's see how it goes over. heh.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply

* Re: [PATCH net] bonding: Force slave speed check after link state recovery for 802.3ad
From: David Miller @ 2019-07-22 19:09 UTC (permalink / raw)
  To: tlfalcon; +Cc: netdev, bjking1, pradeep, jarod, j.vosburgh, vfalico, andy
In-Reply-To: <1563315910-25634-1-git-send-email-tlfalcon@linux.ibm.com>

From: Thomas Falcon <tlfalcon@linux.ibm.com>
Date: Tue, 16 Jul 2019 17:25:10 -0500

> The following scenario was encountered during testing of logical
> partition mobility on pseries partitions with bonded ibmvnic
> adapters in LACP mode.
> 
> 1. Driver receives a signal that the device has been
>    swapped, and it needs to reset to initialize the new
>    device.
> 
> 2. Driver reports loss of carrier and begins initialization.
> 
> 3. Bonding driver receives NETDEV_CHANGE notifier and checks
>    the slave's current speed and duplex settings. Because these
>    are unknown at the time, the bond sets its link state to
>    BOND_LINK_FAIL and handles the speed update, clearing
>    AD_PORT_LACP_ENABLE.
> 
> 4. Driver finishes recovery and reports that the carrier is on.
> 
> 5. Bond receives a new notification and checks the speed again.
>    The speeds are valid but miimon has not altered the link
>    state yet.  AD_PORT_LACP_ENABLE remains off.
> 
> Because the slave's link state is still BOND_LINK_FAIL,
> no further port checks are made when it recovers. Though
> the slave devices are operational and have valid speed
> and duplex settings, the bond will not send LACPDU's. The
> simplest fix I can see is to force another speed check
> in bond_miimon_commit. This way the bond will update
> AD_PORT_LACP_ENABLE if needed when transitioning from
> BOND_LINK_FAIL to BOND_LINK_UP.
> 
> CC: Jarod Wilson <jarod@redhat.com>
> CC: Jay Vosburgh <j.vosburgh@gmail.com>
> CC: Veaceslav Falico <vfalico@gmail.com>
> CC: Andy Gospodarek <andy@greyhouse.net>
> Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>

Applied, thanks Thomas.

^ permalink raw reply

* Re: [PATCH] net/mlx5: fix -Wtype-limits compilation warnings
From: David Miller @ 2019-07-22 19:09 UTC (permalink / raw)
  To: cai; +Cc: saeedm, leonro, yishaih, netdev, linux-rdma, linux-kernel
In-Reply-To: <1563820482-10302-1-git-send-email-cai@lca.pw>

From: Qian Cai <cai@lca.pw>
Date: Mon, 22 Jul 2019 14:34:42 -0400

> The commit b9a7ba556207 ("net/mlx5: Use event mask based on device
> capabilities") introduced a few compilation warnings due to it bumps
> MLX5_EVENT_TYPE_MAX from 0x27 to 0x100 which is always greater than
> an "struct {mlx5_eqe|mlx5_nb}.type" that is an "u8".
> 
> drivers/net/ethernet/mellanox/mlx5/core/eq.c: In function
> 'mlx5_eq_notifier_register':
> drivers/net/ethernet/mellanox/mlx5/core/eq.c:948:21: warning: comparison
> is always false due to limited range of data type [-Wtype-limits]
>   if (nb->event_type >= MLX5_EVENT_TYPE_MAX)
>                      ^~
> drivers/net/ethernet/mellanox/mlx5/core/eq.c: In function
> 'mlx5_eq_notifier_unregister':
> drivers/net/ethernet/mellanox/mlx5/core/eq.c:959:21: warning: comparison
> is always false due to limited range of data type [-Wtype-limits]
>   if (nb->event_type >= MLX5_EVENT_TYPE_MAX)
> 
> Fix them by removing unnecessary checkings.
> 
> Fixes: b9a7ba556207 ("net/mlx5: Use event mask based on device capabilities")
> Signed-off-by: Qian Cai <cai@lca.pw>

Saeed, I am assuming that you will take this.

^ permalink raw reply

* Re: [PATCH 1/3] drivers/gpu/drm/via: convert put_page() to put_user_page*()
From: Matthew Wilcox @ 2019-07-22 19:07 UTC (permalink / raw)
  To: John Hubbard
  Cc: Christoph Hellwig, john.hubbard, Andrew Morton, Alexander Viro,
	Björn Töpel, Boaz Harrosh, Daniel Vetter, Dan Williams,
	Dave Chinner, David Airlie, David S . Miller, Ilya Dryomov,
	Jan Kara, Jason Gunthorpe, Jens Axboe, Jérôme Glisse,
	Johannes Thumshirn, Magnus Karlsson, Miklos Szeredi, Ming Lei,
	Sage Weil, Santosh Shilimkar, Yan Zheng, netdev, dri-devel,
	linux-mm, linux-rdma, bpf, LKML
In-Reply-To: <397ff3e4-e857-037a-1aee-ff6242e024b2@nvidia.com>

On Mon, Jul 22, 2019 at 11:53:54AM -0700, John Hubbard wrote:
> On 7/22/19 2:33 AM, Christoph Hellwig wrote:
> > On Sun, Jul 21, 2019 at 09:30:10PM -0700, john.hubbard@gmail.com wrote:
> >>  		for (i = 0; i < vsg->num_pages; ++i) {
> >>  			if (NULL != (page = vsg->pages[i])) {
> >>  				if (!PageReserved(page) && (DMA_FROM_DEVICE == vsg->direction))
> >> -					SetPageDirty(page);
> >> -				put_page(page);
> >> +					put_user_pages_dirty(&page, 1);
> >> +				else
> >> +					put_user_page(page);
> >>  			}
> > 
> > Can't just pass a dirty argument to put_user_pages?  Also do we really
> 
> Yes, and in fact that would help a lot more than the single page case,
> which is really just cosmetic after all.
> 
> > need a separate put_user_page for the single page case?
> > put_user_pages_dirty?
> 
> Not really. I'm still zeroing in on the ideal API for all these call sites,
> and I agree that the approach below is cleaner.

so enum { CLEAN = 0, DIRTY = 1, LOCK = 2, DIRTY_LOCK = 3 };
?

^ permalink raw reply

* Re: [PATCH 3/3] gup: new put_user_page_dirty*() helpers
From: John Hubbard @ 2019-07-22 19:05 UTC (permalink / raw)
  To: john.hubbard, Andrew Morton
  Cc: Alexander Viro, Björn Töpel, Boaz Harrosh,
	Christoph Hellwig, Daniel Vetter, Dan Williams, Dave Chinner,
	David Airlie, David S . Miller, Ilya Dryomov, Jan Kara,
	Jason Gunthorpe, Jens Axboe, Jérôme Glisse,
	Johannes Thumshirn, Magnus Karlsson, Matthew Wilcox,
	Miklos Szeredi, Ming Lei, Sage Weil, Santosh Shilimkar, Yan Zheng,
	netdev, dri-devel, linux-mm, linux-rdma, bpf, LKML
In-Reply-To: <20190722043012.22945-4-jhubbard@nvidia.com>

On 7/21/19 9:30 PM, john.hubbard@gmail.com wrote:
> From: John Hubbard <jhubbard@nvidia.com>
> 
> While converting call sites to use put_user_page*() [1], quite a few
> places ended up needing a single-page routine to put and dirty a
> page.
> 
> Provide put_user_page_dirty() and put_user_page_dirty_lock(),
> and use them in a few places: net/xdp, drm/via/, drivers/infiniband.
> 

Please disregard this one, I'm going to drop it, as per the
discussion in patch 1.

thanks,
-- 
John Hubbard
NVIDIA

> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Jan Kara <jack@suse.cz>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
>  drivers/gpu/drm/via/via_dmablit.c        |  2 +-
>  drivers/infiniband/core/umem.c           |  2 +-
>  drivers/infiniband/hw/usnic/usnic_uiom.c |  2 +-
>  include/linux/mm.h                       | 10 ++++++++++
>  net/xdp/xdp_umem.c                       |  2 +-
>  5 files changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/via/via_dmablit.c b/drivers/gpu/drm/via/via_dmablit.c
> index 219827ae114f..d30b2d75599f 100644
> --- a/drivers/gpu/drm/via/via_dmablit.c
> +++ b/drivers/gpu/drm/via/via_dmablit.c
> @@ -189,7 +189,7 @@ via_free_sg_info(struct pci_dev *pdev, drm_via_sg_info_t *vsg)
>  		for (i = 0; i < vsg->num_pages; ++i) {
>  			if (NULL != (page = vsg->pages[i])) {
>  				if (!PageReserved(page) && (DMA_FROM_DEVICE == vsg->direction))
> -					put_user_pages_dirty(&page, 1);
> +					put_user_page_dirty(page);
>  				else
>  					put_user_page(page);
>  			}
> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> index 08da840ed7ee..a7337cc3ca20 100644
> --- a/drivers/infiniband/core/umem.c
> +++ b/drivers/infiniband/core/umem.c
> @@ -55,7 +55,7 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d
>  	for_each_sg_page(umem->sg_head.sgl, &sg_iter, umem->sg_nents, 0) {
>  		page = sg_page_iter_page(&sg_iter);
>  		if (umem->writable && dirty)
> -			put_user_pages_dirty_lock(&page, 1);
> +			put_user_page_dirty_lock(page);
>  		else
>  			put_user_page(page);
>  	}
> diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c
> index 0b0237d41613..d2ded624fb2a 100644
> --- a/drivers/infiniband/hw/usnic/usnic_uiom.c
> +++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
> @@ -76,7 +76,7 @@ static void usnic_uiom_put_pages(struct list_head *chunk_list, int dirty)
>  			page = sg_page(sg);
>  			pa = sg_phys(sg);
>  			if (dirty)
> -				put_user_pages_dirty_lock(&page, 1);
> +				put_user_page_dirty_lock(page);
>  			else
>  				put_user_page(page);
>  			usnic_dbg("pa: %pa\n", &pa);
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 0334ca97c584..c0584c6d9d78 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1061,6 +1061,16 @@ void put_user_pages_dirty(struct page **pages, unsigned long npages);
>  void put_user_pages_dirty_lock(struct page **pages, unsigned long npages);
>  void put_user_pages(struct page **pages, unsigned long npages);
>  
> +static inline void put_user_page_dirty(struct page *page)
> +{
> +	put_user_pages_dirty(&page, 1);
> +}
> +
> +static inline void put_user_page_dirty_lock(struct page *page)
> +{
> +	put_user_pages_dirty_lock(&page, 1);
> +}
> +
>  #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
>  #define SECTION_IN_PAGE_FLAGS
>  #endif
> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index 9cbbb96c2a32..1d122e52c6de 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c
> @@ -171,7 +171,7 @@ static void xdp_umem_unpin_pages(struct xdp_umem *umem)
>  	for (i = 0; i < umem->npgs; i++) {
>  		struct page *page = umem->pgs[i];
>  
> -		put_user_pages_dirty_lock(&page, 1);
> +		put_user_page_dirty_lock(page);
>  	}
>  
>  	kfree(umem->pgs);
> 

^ permalink raw reply

* net-next is OPEN...
From: David Miller @ 2019-07-22 19:03 UTC (permalink / raw)
  To: netdev


Flood me...

^ permalink raw reply

* Re: [PATCH v2 6/7] dt-bindings: net: realtek: Add property to configure LED mode
From: Andrew Lunn @ 2019-07-22 19:01 UTC (permalink / raw)
  To: Matthias Kaehlcke
  Cc: Rob Herring, Florian Fainelli, David S . Miller, Mark Rutland,
	Heiner Kallweit, netdev, devicetree, linux-kernel@vger.kernel.org,
	Douglas Anderson
In-Reply-To: <20190722171418.GV250418@google.com>

On Mon, Jul 22, 2019 at 10:14:18AM -0700, Matthias Kaehlcke wrote:
> I'm working on a generic binding.
> 
> I wonder what is the best process for reviewing/landing it, I'm
> doubting between two options:
> 
> a) only post the binding doc and the generic PHY code that reads
>    the configuration from the DT. Post Realtek patches once
>    the binding/generic code has been acked.
> 
>    pros: no churn from Realtek specific patches
>    cons: initially no (real) user of the new binding
> 
> b) post generic and Realtek changes together
> 
>    pros: the binding has a user initially
>    cons: churn from Realtek specific patches
> 
> I can do either, depending on what maintainers/reviewers prefer. I'm
> slightly inclined towards a)

Hi Matthias

It is normal to include one user of any generic API which is added,
just to make is clear how an API should be used.

     Andrew

^ permalink raw reply

* Re: [PATCH 1/3] drivers/gpu/drm/via: convert put_page() to put_user_page*()
From: John Hubbard @ 2019-07-22 18:53 UTC (permalink / raw)
  To: Christoph Hellwig, john.hubbard
  Cc: Andrew Morton, Alexander Viro, Björn Töpel,
	Boaz Harrosh, Daniel Vetter, Dan Williams, Dave Chinner,
	David Airlie, David S . Miller, Ilya Dryomov, Jan Kara,
	Jason Gunthorpe, Jens Axboe, Jérôme Glisse,
	Johannes Thumshirn, Magnus Karlsson, Matthew Wilcox,
	Miklos Szeredi, Ming Lei, Sage Weil, Santosh Shilimkar, Yan Zheng,
	netdev, dri-devel, linux-mm, linux-rdma, bpf, LKML
In-Reply-To: <20190722093355.GB29538@lst.de>

On 7/22/19 2:33 AM, Christoph Hellwig wrote:
> On Sun, Jul 21, 2019 at 09:30:10PM -0700, john.hubbard@gmail.com wrote:
>>  		for (i = 0; i < vsg->num_pages; ++i) {
>>  			if (NULL != (page = vsg->pages[i])) {
>>  				if (!PageReserved(page) && (DMA_FROM_DEVICE == vsg->direction))
>> -					SetPageDirty(page);
>> -				put_page(page);
>> +					put_user_pages_dirty(&page, 1);
>> +				else
>> +					put_user_page(page);
>>  			}
> 
> Can't just pass a dirty argument to put_user_pages?  Also do we really

Yes, and in fact that would help a lot more than the single page case,
which is really just cosmetic after all.

> need a separate put_user_page for the single page case?
> put_user_pages_dirty?

Not really. I'm still zeroing in on the ideal API for all these call sites,
and I agree that the approach below is cleaner.

> 
> Also the PageReserved check looks bogus, as I can't see how a reserved
> page can end up here.  So IMHO the above snippled should really look
> something like this:
> 
> 	put_user_pages(vsg->pages[i], vsg->num_pages,
> 			vsg->direction == DMA_FROM_DEVICE);
> 
> in the end.
> 

Agreed.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply

* [PATCH] net/mlx5: fix -Wtype-limits compilation warnings
From: Qian Cai @ 2019-07-22 18:34 UTC (permalink / raw)
  To: saeedm, leonro; +Cc: yishaih, davem, netdev, linux-rdma, linux-kernel, Qian Cai

The commit b9a7ba556207 ("net/mlx5: Use event mask based on device
capabilities") introduced a few compilation warnings due to it bumps
MLX5_EVENT_TYPE_MAX from 0x27 to 0x100 which is always greater than
an "struct {mlx5_eqe|mlx5_nb}.type" that is an "u8".

drivers/net/ethernet/mellanox/mlx5/core/eq.c: In function
'mlx5_eq_notifier_register':
drivers/net/ethernet/mellanox/mlx5/core/eq.c:948:21: warning: comparison
is always false due to limited range of data type [-Wtype-limits]
  if (nb->event_type >= MLX5_EVENT_TYPE_MAX)
                     ^~
drivers/net/ethernet/mellanox/mlx5/core/eq.c: In function
'mlx5_eq_notifier_unregister':
drivers/net/ethernet/mellanox/mlx5/core/eq.c:959:21: warning: comparison
is always false due to limited range of data type [-Wtype-limits]
  if (nb->event_type >= MLX5_EVENT_TYPE_MAX)

Fix them by removing unnecessary checkings.

Fixes: b9a7ba556207 ("net/mlx5: Use event mask based on device capabilities")
Signed-off-by: Qian Cai <cai@lca.pw>
---
 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 12 +-----------
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 41f25ea2e8d9..2df9aaa421c6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -215,11 +215,7 @@ static int mlx5_eq_async_int(struct notifier_block *nb,
 		 */
 		dma_rmb();
 
-		if (likely(eqe->type < MLX5_EVENT_TYPE_MAX))
-			atomic_notifier_call_chain(&eqt->nh[eqe->type], eqe->type, eqe);
-		else
-			mlx5_core_warn_once(dev, "notifier_call_chain is not setup for eqe: %d\n", eqe->type);
-
+		atomic_notifier_call_chain(&eqt->nh[eqe->type], eqe->type, eqe);
 		atomic_notifier_call_chain(&eqt->nh[MLX5_EVENT_TYPE_NOTIFY_ANY], eqe->type, eqe);
 
 		++eq->cons_index;
@@ -945,9 +941,6 @@ int mlx5_eq_notifier_register(struct mlx5_core_dev *dev, struct mlx5_nb *nb)
 {
 	struct mlx5_eq_table *eqt = dev->priv.eq_table;
 
-	if (nb->event_type >= MLX5_EVENT_TYPE_MAX)
-		return -EINVAL;
-
 	return atomic_notifier_chain_register(&eqt->nh[nb->event_type], &nb->nb);
 }
 EXPORT_SYMBOL(mlx5_eq_notifier_register);
@@ -956,9 +949,6 @@ int mlx5_eq_notifier_unregister(struct mlx5_core_dev *dev, struct mlx5_nb *nb)
 {
 	struct mlx5_eq_table *eqt = dev->priv.eq_table;
 
-	if (nb->event_type >= MLX5_EVENT_TYPE_MAX)
-		return -EINVAL;
-
 	return atomic_notifier_chain_unregister(&eqt->nh[nb->event_type], &nb->nb);
 }
 EXPORT_SYMBOL(mlx5_eq_notifier_unregister);
-- 
1.8.3.1


^ permalink raw reply related

* [RFC PATCH net-next 12/12] drop_monitor: Add a command to query current configuration
From: Ido Schimmel @ 2019-07-22 18:31 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, dsahern, roopa, nikolay, jakub.kicinski, toke,
	andy, f.fainelli, andrew, vivien.didelot, mlxsw, Ido Schimmel
In-Reply-To: <20190722183134.14516-1-idosch@idosch.org>

From: Ido Schimmel <idosch@mellanox.com>

Users should be able to query the current configuration of drop monitor
before they start using it. Add a command to query the existing
configuration which currently consists of alert mode and packet
truncation length.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 include/uapi/linux/net_dropmon.h |  2 ++
 net/core/drop_monitor.c          | 48 ++++++++++++++++++++++++++++++++
 2 files changed, 50 insertions(+)

diff --git a/include/uapi/linux/net_dropmon.h b/include/uapi/linux/net_dropmon.h
index eb36b61485ef..cdcd23a8b72b 100644
--- a/include/uapi/linux/net_dropmon.h
+++ b/include/uapi/linux/net_dropmon.h
@@ -54,6 +54,8 @@ enum {
 	NET_DM_CMD_START,
 	NET_DM_CMD_STOP,
 	NET_DM_CMD_PACKET_ALERT,
+	NET_DM_CMD_CONFIG_GET,
+	NET_DM_CMD_CONFIG_NEW,
 	_NET_DM_CMD_MAX,
 };
 
diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
index 5af1e1e8d4d0..c753cd6c39ec 100644
--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -656,6 +656,50 @@ static int net_dm_cmd_trace(struct sk_buff *skb,
 	return -EOPNOTSUPP;
 }
 
+static int net_dm_config_fill(struct sk_buff *msg, struct genl_info *info)
+{
+	void *hdr;
+
+	hdr = genlmsg_put(msg, info->snd_portid, info->snd_seq,
+			  &net_drop_monitor_family, 0, NET_DM_CMD_CONFIG_NEW);
+	if (!hdr)
+		return -EMSGSIZE;
+
+	if (nla_put_u8(msg, NET_DM_ATTR_ALERT_MODE, net_dm_alert_mode))
+		goto nla_put_failure;
+
+	if (nla_put_u32(msg, NET_DM_ATTR_TRUNC_LEN, net_dm_trunc_len))
+		goto nla_put_failure;
+
+	genlmsg_end(msg, hdr);
+
+	return 0;
+
+nla_put_failure:
+	genlmsg_cancel(msg, hdr);
+	return -EMSGSIZE;
+}
+
+static int net_dm_cmd_config_get(struct sk_buff *skb, struct genl_info *info)
+{
+	struct sk_buff *msg;
+	int rc;
+
+	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!msg)
+		return -ENOMEM;
+
+	rc = net_dm_config_fill(msg, info);
+	if (rc)
+		goto err_config_fill;
+
+	return genlmsg_reply(msg, info);
+
+err_config_fill:
+	nlmsg_free(msg);
+	return rc;
+}
+
 static int dropmon_net_event(struct notifier_block *ev_block,
 			     unsigned long event, void *ptr)
 {
@@ -718,6 +762,10 @@ static const struct genl_ops dropmon_ops[] = {
 		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
 		.doit = net_dm_cmd_trace,
 	},
+	{
+		.cmd = NET_DM_CMD_CONFIG_GET,
+		.doit = net_dm_cmd_config_get,
+	},
 };
 
 static struct genl_family net_drop_monitor_family __ro_after_init = {
-- 
2.21.0


^ permalink raw reply related

* [RFC PATCH net-next 11/12] drop_monitor: Allow truncation of dropped packets
From: Ido Schimmel @ 2019-07-22 18:31 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, dsahern, roopa, nikolay, jakub.kicinski, toke,
	andy, f.fainelli, andrew, vivien.didelot, mlxsw, Ido Schimmel
In-Reply-To: <20190722183134.14516-1-idosch@idosch.org>

From: Ido Schimmel <idosch@mellanox.com>

When sending dropped packets to user space it is not always necessary to
copy the entire packet as usually only the headers are of interest.

Allow user to specify the truncation length and add the original length
of the packet as additional metadata to the netlink message.

By default no truncation is performed.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 include/uapi/linux/net_dropmon.h |  2 ++
 net/core/drop_monitor.c          | 23 +++++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/include/uapi/linux/net_dropmon.h b/include/uapi/linux/net_dropmon.h
index 7708c8a440a1..eb36b61485ef 100644
--- a/include/uapi/linux/net_dropmon.h
+++ b/include/uapi/linux/net_dropmon.h
@@ -75,6 +75,8 @@ enum net_dm_attr {
 	NET_DM_ATTR_TIMESTAMP,			/* struct timespec */
 	NET_DM_ATTR_PAYLOAD,			/* binary */
 	NET_DM_ATTR_PAD,
+	NET_DM_ATTR_TRUNC_LEN,			/* u32 */
+	NET_DM_ATTR_ORIG_LEN,			/* u32 */
 
 	__NET_DM_ATTR_MAX,
 	NET_DM_ATTR_MAX = __NET_DM_ATTR_MAX - 1
diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
index 512935fc235b..5af1e1e8d4d0 100644
--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -77,6 +77,7 @@ static unsigned long dm_hw_check_delta = 2*HZ;
 static LIST_HEAD(hw_stats_list);
 
 static enum net_dm_alert_mode net_dm_alert_mode = NET_DM_ALERT_MODE_SUMMARY;
+static u32 net_dm_trunc_len;
 
 struct net_dm_skb_cb {
 	void *pc;
@@ -332,6 +333,8 @@ static size_t net_dm_packet_report_size(size_t payload_len)
 	       nla_total_size(IFNAMSIZ + 1) +
 	       /* NET_DM_ATTR_TIMESTAMP */
 	       nla_total_size(sizeof(struct timespec)) +
+	       /* NET_DM_ATTR_ORIG_LEN */
+	       nla_total_size(sizeof(u32)) +
 	       /* NET_DM_ATTR_PAYLOAD */
 	       nla_total_size(payload_len);
 }
@@ -369,6 +372,9 @@ static int net_dm_packet_report_fill(struct sk_buff *msg, struct sk_buff *skb,
 	    nla_put(msg, NET_DM_ATTR_TIMESTAMP, sizeof(ts), &ts))
 		goto nla_put_failure;
 
+	if (nla_put_u32(msg, NET_DM_ATTR_ORIG_LEN, skb->len))
+		goto nla_put_failure;
+
 	if (!payload_len)
 		goto out;
 
@@ -404,6 +410,8 @@ static void net_dm_packet_report(struct sk_buff *skb)
 
 	/* Ensure packet fits inside a single netlink attribute */
 	payload_len = min_t(size_t, skb->len, NET_DM_MAX_PACKET_SIZE);
+	if (net_dm_trunc_len)
+		payload_len = min_t(size_t, net_dm_trunc_len, payload_len);
 
 	msg = nlmsg_new(net_dm_packet_report_size(payload_len), GFP_KERNEL);
 	if (!msg)
@@ -603,6 +611,16 @@ static int net_dm_alert_mode_set(struct genl_info *info)
 	return 0;
 }
 
+static int net_dm_trunc_len_set(struct genl_info *info)
+{
+	if (!info->attrs[NET_DM_ATTR_TRUNC_LEN])
+		return 0;
+
+	net_dm_trunc_len = nla_get_u32(info->attrs[NET_DM_ATTR_TRUNC_LEN]);
+
+	return 0;
+}
+
 static int net_dm_cmd_config(struct sk_buff *skb,
 			struct genl_info *info)
 {
@@ -618,6 +636,10 @@ static int net_dm_cmd_config(struct sk_buff *skb,
 	if (rc)
 		return rc;
 
+	rc = net_dm_trunc_len_set(info);
+	if (rc)
+		return rc;
+
 	return 0;
 }
 
@@ -676,6 +698,7 @@ static int dropmon_net_event(struct notifier_block *ev_block,
 static const struct nla_policy net_dm_nl_policy[NET_DM_ATTR_MAX + 1] = {
 	[NET_DM_ATTR_UNSPEC] = { .strict_start_type = NET_DM_ATTR_UNSPEC + 1 },
 	[NET_DM_ATTR_ALERT_MODE] = { .type = NLA_U8 },
+	[NET_DM_ATTR_TRUNC_LEN] = { .type = NLA_U32 },
 };
 
 static const struct genl_ops dropmon_ops[] = {
-- 
2.21.0


^ permalink raw reply related

* [RFC PATCH net-next 10/12] drop_monitor: Add packet alert mode
From: Ido Schimmel @ 2019-07-22 18:31 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, dsahern, roopa, nikolay, jakub.kicinski, toke,
	andy, f.fainelli, andrew, vivien.didelot, mlxsw, Ido Schimmel
In-Reply-To: <20190722183134.14516-1-idosch@idosch.org>

From: Ido Schimmel <idosch@mellanox.com>

So far drop monitor supported only one alert mode in which a summary of
locations in which packets were recently dropped was sent to user space.

This alert mode is sufficient in order to understand that packets were
dropped, but lacks information to perform a more detailed analysis.

Add a new alert mode in which the dropped packet itself is passed to
user space along with metadata: The drop location (as program counter
and resolved symbol), ingress / egress netdevice and arrival / departure
timestamp. More metadata can be added in the future.

To avoid performing expensive operations in the context in which
kfree_skb() is invoked (can be hard IRQ), the dropped skb is cloned and
queued on per-CPU skb drop list. Then, in process context the netlink
message is allocated, prepared and finally sent to user space.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 include/uapi/linux/net_dropmon.h |  29 ++++
 net/core/drop_monitor.c          | 287 ++++++++++++++++++++++++++++++-
 2 files changed, 308 insertions(+), 8 deletions(-)

diff --git a/include/uapi/linux/net_dropmon.h b/include/uapi/linux/net_dropmon.h
index 5edbd0a675fd..7708c8a440a1 100644
--- a/include/uapi/linux/net_dropmon.h
+++ b/include/uapi/linux/net_dropmon.h
@@ -53,6 +53,7 @@ enum {
 	NET_DM_CMD_CONFIG,
 	NET_DM_CMD_START,
 	NET_DM_CMD_STOP,
+	NET_DM_CMD_PACKET_ALERT,
 	_NET_DM_CMD_MAX,
 };
 
@@ -62,4 +63,32 @@ enum {
  * Our group identifiers
  */
 #define NET_DM_GRP_ALERT 1
+
+enum net_dm_attr {
+	NET_DM_ATTR_UNSPEC,
+
+	NET_DM_ATTR_ALERT_MODE,			/* u8 */
+	NET_DM_ATTR_PC,				/* u64 */
+	NET_DM_ATTR_SYMBOL,			/* string */
+	NET_DM_ATTR_NETDEV_IFINDEX,		/* u32 */
+	NET_DM_ATTR_NETDEV_NAME,		/* string */
+	NET_DM_ATTR_TIMESTAMP,			/* struct timespec */
+	NET_DM_ATTR_PAYLOAD,			/* binary */
+	NET_DM_ATTR_PAD,
+
+	__NET_DM_ATTR_MAX,
+	NET_DM_ATTR_MAX = __NET_DM_ATTR_MAX - 1
+};
+
+/**
+ * enum net_dm_alert_mode - Alert mode.
+ * @NET_DM_ALERT_MODE_SUMMARY: A summary of recent drops is sent to user space.
+ * @NET_DM_ALERT_MODE_PACKET: Each dropped packet is sent to user space along
+ *                            with metadata.
+ */
+enum net_dm_alert_mode {
+	NET_DM_ALERT_MODE_SUMMARY,
+	NET_DM_ALERT_MODE_PACKET,
+};
+
 #endif
diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
index f441fb653567..512935fc235b 100644
--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -54,6 +54,7 @@ static DEFINE_MUTEX(net_dm_mutex);
 struct per_cpu_dm_data {
 	spinlock_t		lock;	/* Protects 'skb' and 'send_timer' */
 	struct sk_buff		*skb;
+	struct sk_buff_head	drop_queue;
 	struct work_struct	dm_alert_work;
 	struct timer_list	send_timer;
 };
@@ -75,6 +76,22 @@ static int dm_delay = 1;
 static unsigned long dm_hw_check_delta = 2*HZ;
 static LIST_HEAD(hw_stats_list);
 
+static enum net_dm_alert_mode net_dm_alert_mode = NET_DM_ALERT_MODE_SUMMARY;
+
+struct net_dm_skb_cb {
+	void *pc;
+};
+
+#define NET_DM_SKB_CB(__skb) ((struct net_dm_skb_cb *)&((__skb)->cb[0]))
+
+struct net_dm_alert_ops {
+	void (*kfree_skb_probe)(void *ignore, struct sk_buff *skb,
+				void *location);
+	void (*napi_poll_probe)(void *ignore, struct napi_struct *napi,
+				int work, int budget);
+	void (*work_item_func)(struct work_struct *work);
+};
+
 static int net_dm_nl_pre_doit(const struct genl_ops *ops,
 			      struct sk_buff *skb, struct genl_info *info)
 {
@@ -255,10 +272,194 @@ static void trace_napi_poll_hit(void *ignore, struct napi_struct *napi,
 	rcu_read_unlock();
 }
 
+static const struct net_dm_alert_ops net_dm_alert_summary_ops = {
+	.kfree_skb_probe	= trace_kfree_skb_hit,
+	.napi_poll_probe	= trace_napi_poll_hit,
+	.work_item_func		= send_dm_alert,
+};
+
+static void net_dm_packet_trace_kfree_skb_hit(void *ignore,
+					      struct sk_buff *skb,
+					      void *location)
+{
+	struct per_cpu_dm_data *data;
+	struct sk_buff *nskb;
+	unsigned long flags;
+
+	nskb = skb_clone(skb, GFP_ATOMIC);
+	if (!nskb)
+		return;
+
+	NET_DM_SKB_CB(nskb)->pc = location;
+	if (nskb->dev)
+		dev_hold(nskb->dev);
+
+	data = this_cpu_ptr(&dm_cpu_data);
+	spin_lock_irqsave(&data->drop_queue.lock, flags);
+
+	__skb_queue_tail(&data->drop_queue, nskb);
+
+	if (!timer_pending(&data->send_timer)) {
+		data->send_timer.expires = jiffies + dm_delay * HZ;
+		add_timer(&data->send_timer);
+	}
+
+	spin_unlock_irqrestore(&data->drop_queue.lock, flags);
+}
+
+static void net_dm_packet_trace_napi_poll_hit(void *ignore,
+					      struct napi_struct *napi,
+					      int work, int budget)
+{
+}
+
+#define NET_DM_MAX_SYMBOL_LEN 40
+
+static size_t net_dm_packet_report_size(size_t payload_len)
+{
+	size_t size;
+
+	size = nlmsg_msg_size(GENL_HDRLEN + net_drop_monitor_family.hdrsize);
+
+	return NLMSG_ALIGN(size) +
+	       /* NET_DM_ATTR_PC */
+	       nla_total_size(sizeof(u64)) +
+	       /* NET_DM_ATTR_SYMBOL */
+	       nla_total_size(NET_DM_MAX_SYMBOL_LEN + 1) +
+	       /* NET_DM_ATTR_NETDEV_IFINDEX */
+	       nla_total_size(sizeof(u32)) +
+	       /* NET_DM_ATTR_NETDEV_NAME */
+	       nla_total_size(IFNAMSIZ + 1) +
+	       /* NET_DM_ATTR_TIMESTAMP */
+	       nla_total_size(sizeof(struct timespec)) +
+	       /* NET_DM_ATTR_PAYLOAD */
+	       nla_total_size(payload_len);
+}
+
+static int net_dm_packet_report_fill(struct sk_buff *msg, struct sk_buff *skb,
+				     size_t payload_len)
+{
+	u64 pc = (u64)(uintptr_t) NET_DM_SKB_CB(skb)->pc;
+	char buf[NET_DM_MAX_SYMBOL_LEN];
+	struct nlattr *attr;
+	struct timespec ts;
+	void *hdr;
+
+	hdr = genlmsg_put(msg, 0, 0, &net_drop_monitor_family, 0,
+			  NET_DM_CMD_PACKET_ALERT);
+	if (!hdr)
+		return -EMSGSIZE;
+
+	if (nla_put_u64_64bit(msg, NET_DM_ATTR_PC, pc, NET_DM_ATTR_PAD))
+		goto nla_put_failure;
+
+	snprintf(buf, sizeof(buf), "%pS", NET_DM_SKB_CB(skb)->pc);
+	if (nla_put_string(msg, NET_DM_ATTR_SYMBOL, buf))
+		goto nla_put_failure;
+
+	if (skb->dev &&
+	    nla_put_u32(msg, NET_DM_ATTR_NETDEV_IFINDEX, skb->dev->ifindex))
+		goto nla_put_failure;
+
+	if (skb->dev &&
+	    nla_put_string(msg, NET_DM_ATTR_NETDEV_NAME, skb->dev->name))
+		goto nla_put_failure;
+
+	if (ktime_to_timespec_cond(skb->tstamp, &ts) &&
+	    nla_put(msg, NET_DM_ATTR_TIMESTAMP, sizeof(ts), &ts))
+		goto nla_put_failure;
+
+	if (!payload_len)
+		goto out;
+
+	attr = skb_put(msg, nla_total_size(payload_len));
+	attr->nla_type = NET_DM_ATTR_PAYLOAD;
+	attr->nla_len = nla_attr_size(payload_len);
+	if (skb_copy_bits(skb, 0, nla_data(attr), payload_len))
+		goto nla_put_failure;
+
+out:
+	genlmsg_end(msg, hdr);
+
+	return 0;
+
+nla_put_failure:
+	genlmsg_cancel(msg, hdr);
+	return -EMSGSIZE;
+}
+
+#define NET_DM_MAX_PACKET_SIZE (0xffff - NLA_HDRLEN - NLA_ALIGNTO)
+
+static void net_dm_packet_report(struct sk_buff *skb)
+{
+	struct sk_buff *msg;
+	size_t payload_len;
+	int rc;
+
+	/* Make sure we start copying the packet from the MAC header */
+	if (skb->data > skb_mac_header(skb))
+		skb_push(skb, skb->data - skb_mac_header(skb));
+	else
+		skb_pull(skb, skb_mac_header(skb) - skb->data);
+
+	/* Ensure packet fits inside a single netlink attribute */
+	payload_len = min_t(size_t, skb->len, NET_DM_MAX_PACKET_SIZE);
+
+	msg = nlmsg_new(net_dm_packet_report_size(payload_len), GFP_KERNEL);
+	if (!msg)
+		goto out;
+
+	rc = net_dm_packet_report_fill(msg, skb, payload_len);
+	if (rc) {
+		nlmsg_free(msg);
+		goto out;
+	}
+
+	genlmsg_multicast(&net_drop_monitor_family, msg, 0, 0, GFP_KERNEL);
+
+out:
+	if (skb->dev)
+		dev_put(skb->dev);
+	consume_skb(skb);
+}
+
+static void net_dm_packet_work(struct work_struct *work)
+{
+	struct per_cpu_dm_data *data;
+	struct sk_buff_head list;
+	struct sk_buff *skb;
+	unsigned long flags;
+
+	data = container_of(work, struct per_cpu_dm_data, dm_alert_work);
+
+	__skb_queue_head_init(&list);
+
+	spin_lock_irqsave(&data->drop_queue.lock, flags);
+	skb_queue_splice_tail_init(&data->drop_queue, &list);
+	spin_unlock_irqrestore(&data->drop_queue.lock, flags);
+
+	while ((skb = __skb_dequeue(&list)))
+		net_dm_packet_report(skb);
+}
+
+static const struct net_dm_alert_ops net_dm_alert_packet_ops = {
+	.kfree_skb_probe	= net_dm_packet_trace_kfree_skb_hit,
+	.napi_poll_probe	= net_dm_packet_trace_napi_poll_hit,
+	.work_item_func		= net_dm_packet_work,
+};
+
+static const struct net_dm_alert_ops *net_dm_alert_ops_arr[] = {
+	[NET_DM_ALERT_MODE_SUMMARY]	= &net_dm_alert_summary_ops,
+	[NET_DM_ALERT_MODE_PACKET]	= &net_dm_alert_packet_ops,
+};
+
 static int net_dm_trace_on_set(struct netlink_ext_ack *extack)
 {
+	const struct net_dm_alert_ops *ops;
 	int cpu, rc;
 
+	ops = net_dm_alert_ops_arr[net_dm_alert_mode];
+
 	if (!try_module_get(THIS_MODULE)) {
 		NL_SET_ERR_MSG_MOD(extack, "Failed to take reference on module");
 		return -ENODEV;
@@ -267,17 +468,17 @@ static int net_dm_trace_on_set(struct netlink_ext_ack *extack)
 	for_each_possible_cpu(cpu) {
 		struct per_cpu_dm_data *data = &per_cpu(dm_cpu_data, cpu);
 
-		INIT_WORK(&data->dm_alert_work, send_dm_alert);
+		INIT_WORK(&data->dm_alert_work, ops->work_item_func);
 		timer_setup(&data->send_timer, sched_send_work, 0);
 	}
 
-	rc = register_trace_kfree_skb(trace_kfree_skb_hit, NULL);
+	rc = register_trace_kfree_skb(ops->kfree_skb_probe, NULL);
 	if (rc) {
 		NL_SET_ERR_MSG_MOD(extack, "Failed to connect probe to kfree_skb() tracepoint");
 		goto err_module_put;
 	}
 
-	rc = register_trace_napi_poll(trace_napi_poll_hit, NULL);
+	rc = register_trace_napi_poll(ops->napi_poll_probe, NULL);
 	if (rc) {
 		NL_SET_ERR_MSG_MOD(extack, "Failed to connect probe to napi_poll() tracepoint");
 		goto err_unregister_trace;
@@ -286,7 +487,7 @@ static int net_dm_trace_on_set(struct netlink_ext_ack *extack)
 	return 0;
 
 err_unregister_trace:
-	unregister_trace_kfree_skb(trace_kfree_skb_hit, NULL);
+	unregister_trace_kfree_skb(ops->kfree_skb_probe, NULL);
 err_module_put:
 	module_put(THIS_MODULE);
 	return rc;
@@ -295,10 +496,13 @@ static int net_dm_trace_on_set(struct netlink_ext_ack *extack)
 static void net_dm_trace_off_set(void)
 {
 	struct dm_hw_stat_delta *new_stat, *temp;
+	const struct net_dm_alert_ops *ops;
 	int cpu;
 
-	unregister_trace_napi_poll(trace_napi_poll_hit, NULL);
-	unregister_trace_kfree_skb(trace_kfree_skb_hit, NULL);
+	ops = net_dm_alert_ops_arr[net_dm_alert_mode];
+
+	unregister_trace_napi_poll(ops->napi_poll_probe, NULL);
+	unregister_trace_kfree_skb(ops->kfree_skb_probe, NULL);
 
 	tracepoint_synchronize_unregister();
 
@@ -307,9 +511,18 @@ static void net_dm_trace_off_set(void)
 	 */
 	for_each_possible_cpu(cpu) {
 		struct per_cpu_dm_data *data = &per_cpu(dm_cpu_data, cpu);
+		struct sk_buff *skb;
 
 		del_timer_sync(&data->send_timer);
 		cancel_work_sync(&data->dm_alert_work);
+		/* If we deactivated a pending timer, some packets are still
+		 * queued and we need to free them.
+		 */
+		while ((skb = __skb_dequeue(&data->drop_queue))) {
+			if (skb->dev)
+				dev_put(skb->dev);
+			consume_skb(skb);
+		}
 	}
 
 	list_for_each_entry_safe(new_stat, temp, &hw_stats_list, list) {
@@ -351,12 +564,61 @@ static int set_all_monitor_traces(int state, struct netlink_ext_ack *extack)
 	return rc;
 }
 
+static int net_dm_alert_mode_get_from_info(struct genl_info *info,
+					   enum net_dm_alert_mode *p_alert_mode)
+{
+	u8 val;
+
+	val = nla_get_u8(info->attrs[NET_DM_ATTR_ALERT_MODE]);
+
+	switch (val) {
+	case NET_DM_ALERT_MODE_SUMMARY: /* fall-through */
+	case NET_DM_ALERT_MODE_PACKET:
+		*p_alert_mode = val;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int net_dm_alert_mode_set(struct genl_info *info)
+{
+	struct netlink_ext_ack *extack = info->extack;
+	enum net_dm_alert_mode alert_mode;
+	int rc;
+
+	if (!info->attrs[NET_DM_ATTR_ALERT_MODE])
+		return 0;
+
+	rc = net_dm_alert_mode_get_from_info(info, &alert_mode);
+	if (rc) {
+		NL_SET_ERR_MSG_MOD(extack, "Invalid alert mode");
+		return -EINVAL;
+	}
+
+	net_dm_alert_mode = alert_mode;
+
+	return 0;
+}
+
 static int net_dm_cmd_config(struct sk_buff *skb,
 			struct genl_info *info)
 {
-	NL_SET_ERR_MSG_MOD(info->extack, "Command not supported");
+	struct netlink_ext_ack *extack = info->extack;
+	int rc;
 
-	return -EOPNOTSUPP;
+	if (trace_state == TRACE_ON) {
+		NL_SET_ERR_MSG_MOD(extack, "Cannot configure drop monitor while tracing is on");
+		return -EOPNOTSUPP;
+	}
+
+	rc = net_dm_alert_mode_set(info);
+	if (rc)
+		return rc;
+
+	return 0;
 }
 
 static int net_dm_cmd_trace(struct sk_buff *skb,
@@ -411,6 +673,11 @@ static int dropmon_net_event(struct notifier_block *ev_block,
 	return NOTIFY_DONE;
 }
 
+static const struct nla_policy net_dm_nl_policy[NET_DM_ATTR_MAX + 1] = {
+	[NET_DM_ATTR_UNSPEC] = { .strict_start_type = NET_DM_ATTR_UNSPEC + 1 },
+	[NET_DM_ATTR_ALERT_MODE] = { .type = NLA_U8 },
+};
+
 static const struct genl_ops dropmon_ops[] = {
 	{
 		.cmd = NET_DM_CMD_CONFIG,
@@ -434,6 +701,8 @@ static struct genl_family net_drop_monitor_family __ro_after_init = {
 	.hdrsize        = 0,
 	.name           = "NET_DM",
 	.version        = 2,
+	.maxattr	= NET_DM_ATTR_MAX,
+	.policy		= net_dm_nl_policy,
 	.pre_doit	= net_dm_nl_pre_doit,
 	.post_doit	= net_dm_nl_post_doit,
 	.module		= THIS_MODULE,
@@ -479,6 +748,7 @@ static int __init init_net_drop_monitor(void)
 		INIT_WORK(&data->dm_alert_work, send_dm_alert);
 		timer_setup(&data->send_timer, sched_send_work, 0);
 		spin_lock_init(&data->lock);
+		skb_queue_head_init(&data->drop_queue);
 		reset_per_cpu_data(data);
 	}
 
@@ -509,6 +779,7 @@ static void exit_net_drop_monitor(void)
 		 * to this struct and can free the skb inside it
 		 */
 		kfree_skb(data->skb);
+		WARN_ON(!skb_queue_empty(&data->drop_queue));
 	}
 
 	BUG_ON(genl_unregister_family(&net_drop_monitor_family));
-- 
2.21.0


^ permalink raw reply related

* [RFC PATCH net-next 09/12] drop_monitor: Require CAP_NET_ADMIN for drop monitor configuration
From: Ido Schimmel @ 2019-07-22 18:31 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, dsahern, roopa, nikolay, jakub.kicinski, toke,
	andy, f.fainelli, andrew, vivien.didelot, mlxsw, Ido Schimmel
In-Reply-To: <20190722183134.14516-1-idosch@idosch.org>

From: Ido Schimmel <idosch@mellanox.com>

Currently, the configure command does not do anything but return an
error. Subsequent patches will enable the command to change various
configuration options such as alert mode and packet truncation.

Similar to other netlink-based configuration channels, make sure only
users with the CAP_NET_ADMIN capability set can execute this command.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 net/core/drop_monitor.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
index 2f56a61c43c6..f441fb653567 100644
--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -416,6 +416,7 @@ static const struct genl_ops dropmon_ops[] = {
 		.cmd = NET_DM_CMD_CONFIG,
 		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
 		.doit = net_dm_cmd_config,
+		.flags = GENL_ADMIN_PERM,
 	},
 	{
 		.cmd = NET_DM_CMD_START,
-- 
2.21.0


^ permalink raw reply related

* [RFC PATCH net-next 08/12] drop_monitor: Initialize timer and work item upon tracing enable
From: Ido Schimmel @ 2019-07-22 18:31 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, dsahern, roopa, nikolay, jakub.kicinski, toke,
	andy, f.fainelli, andrew, vivien.didelot, mlxsw, Ido Schimmel
In-Reply-To: <20190722183134.14516-1-idosch@idosch.org>

From: Ido Schimmel <idosch@mellanox.com>

The timer and work item are currently initialized once during module
init, but subsequent patches will need to associate different functions
with the work item, based on the configured alert mode.

Allow subsequent patches to make that change by initializing and
de-initializing these objects during tracing enable and disable.

This also guarantees that once the request to disable tracing returns,
no more netlink notifications will be generated.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 net/core/drop_monitor.c | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
index e68dafaaebcd..2f56a61c43c6 100644
--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -257,13 +257,20 @@ static void trace_napi_poll_hit(void *ignore, struct napi_struct *napi,
 
 static int net_dm_trace_on_set(struct netlink_ext_ack *extack)
 {
-	int rc;
+	int cpu, rc;
 
 	if (!try_module_get(THIS_MODULE)) {
 		NL_SET_ERR_MSG_MOD(extack, "Failed to take reference on module");
 		return -ENODEV;
 	}
 
+	for_each_possible_cpu(cpu) {
+		struct per_cpu_dm_data *data = &per_cpu(dm_cpu_data, cpu);
+
+		INIT_WORK(&data->dm_alert_work, send_dm_alert);
+		timer_setup(&data->send_timer, sched_send_work, 0);
+	}
+
 	rc = register_trace_kfree_skb(trace_kfree_skb_hit, NULL);
 	if (rc) {
 		NL_SET_ERR_MSG_MOD(extack, "Failed to connect probe to kfree_skb() tracepoint");
@@ -288,12 +295,23 @@ static int net_dm_trace_on_set(struct netlink_ext_ack *extack)
 static void net_dm_trace_off_set(void)
 {
 	struct dm_hw_stat_delta *new_stat, *temp;
+	int cpu;
 
 	unregister_trace_napi_poll(trace_napi_poll_hit, NULL);
 	unregister_trace_kfree_skb(trace_kfree_skb_hit, NULL);
 
 	tracepoint_synchronize_unregister();
 
+	/* Make sure we do not send notifications to user space after request
+	 * to stop tracing returns.
+	 */
+	for_each_possible_cpu(cpu) {
+		struct per_cpu_dm_data *data = &per_cpu(dm_cpu_data, cpu);
+
+		del_timer_sync(&data->send_timer);
+		cancel_work_sync(&data->dm_alert_work);
+	}
+
 	list_for_each_entry_safe(new_stat, temp, &hw_stats_list, list) {
 		if (new_stat->dev == NULL) {
 			list_del_rcu(&new_stat->list);
@@ -481,14 +499,10 @@ static void exit_net_drop_monitor(void)
 	/*
 	 * Because of the module_get/put we do in the trace state change path
 	 * we are guarnateed not to have any current users when we get here
-	 * all we need to do is make sure that we don't have any running timers
-	 * or pending schedule calls
 	 */
 
 	for_each_possible_cpu(cpu) {
 		data = &per_cpu(dm_cpu_data, cpu);
-		del_timer_sync(&data->send_timer);
-		cancel_work_sync(&data->dm_alert_work);
 		/*
 		 * At this point, we should have exclusive access
 		 * to this struct and can free the skb inside it
-- 
2.21.0


^ permalink raw reply related

* [RFC PATCH net-next 07/12] drop_monitor: Split tracing enable / disable to different functions
From: Ido Schimmel @ 2019-07-22 18:31 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, dsahern, roopa, nikolay, jakub.kicinski, toke,
	andy, f.fainelli, andrew, vivien.didelot, mlxsw, Ido Schimmel
In-Reply-To: <20190722183134.14516-1-idosch@idosch.org>

From: Ido Schimmel <idosch@mellanox.com>

Subsequent patches will need to enable / disable tracing based on the
configured alerting mode.

Reduce the nesting level and prepare for the introduction of this
functionality by splitting the tracing enable / disable operations into
two different functions.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 net/core/drop_monitor.c | 79 ++++++++++++++++++++++++++---------------
 1 file changed, 51 insertions(+), 28 deletions(-)

diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
index 099000930736..e68dafaaebcd 100644
--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -255,11 +255,58 @@ static void trace_napi_poll_hit(void *ignore, struct napi_struct *napi,
 	rcu_read_unlock();
 }
 
+static int net_dm_trace_on_set(struct netlink_ext_ack *extack)
+{
+	int rc;
+
+	if (!try_module_get(THIS_MODULE)) {
+		NL_SET_ERR_MSG_MOD(extack, "Failed to take reference on module");
+		return -ENODEV;
+	}
+
+	rc = register_trace_kfree_skb(trace_kfree_skb_hit, NULL);
+	if (rc) {
+		NL_SET_ERR_MSG_MOD(extack, "Failed to connect probe to kfree_skb() tracepoint");
+		goto err_module_put;
+	}
+
+	rc = register_trace_napi_poll(trace_napi_poll_hit, NULL);
+	if (rc) {
+		NL_SET_ERR_MSG_MOD(extack, "Failed to connect probe to napi_poll() tracepoint");
+		goto err_unregister_trace;
+	}
+
+	return 0;
+
+err_unregister_trace:
+	unregister_trace_kfree_skb(trace_kfree_skb_hit, NULL);
+err_module_put:
+	module_put(THIS_MODULE);
+	return rc;
+}
+
+static void net_dm_trace_off_set(void)
+{
+	struct dm_hw_stat_delta *new_stat, *temp;
+
+	unregister_trace_napi_poll(trace_napi_poll_hit, NULL);
+	unregister_trace_kfree_skb(trace_kfree_skb_hit, NULL);
+
+	tracepoint_synchronize_unregister();
+
+	list_for_each_entry_safe(new_stat, temp, &hw_stats_list, list) {
+		if (new_stat->dev == NULL) {
+			list_del_rcu(&new_stat->list);
+			kfree_rcu(new_stat, rcu);
+		}
+	}
+
+	module_put(THIS_MODULE);
+}
+
 static int set_all_monitor_traces(int state, struct netlink_ext_ack *extack)
 {
 	int rc = 0;
-	struct dm_hw_stat_delta *new_stat = NULL;
-	struct dm_hw_stat_delta *temp;
 
 	if (state == trace_state) {
 		NL_SET_ERR_MSG_MOD(extack, "Trace state already set to requested state");
@@ -268,34 +315,10 @@ static int set_all_monitor_traces(int state, struct netlink_ext_ack *extack)
 
 	switch (state) {
 	case TRACE_ON:
-		if (!try_module_get(THIS_MODULE)) {
-			NL_SET_ERR_MSG_MOD(extack, "Failed to take reference on module");
-			rc = -ENODEV;
-			break;
-		}
-
-		rc |= register_trace_kfree_skb(trace_kfree_skb_hit, NULL);
-		rc |= register_trace_napi_poll(trace_napi_poll_hit, NULL);
+		rc = net_dm_trace_on_set(extack);
 		break;
-
 	case TRACE_OFF:
-		rc |= unregister_trace_kfree_skb(trace_kfree_skb_hit, NULL);
-		rc |= unregister_trace_napi_poll(trace_napi_poll_hit, NULL);
-
-		tracepoint_synchronize_unregister();
-
-		/*
-		 * Clean the device list
-		 */
-		list_for_each_entry_safe(new_stat, temp, &hw_stats_list, list) {
-			if (new_stat->dev == NULL) {
-				list_del_rcu(&new_stat->list);
-				kfree_rcu(new_stat, rcu);
-			}
-		}
-
-		module_put(THIS_MODULE);
-
+		net_dm_trace_off_set();
 		break;
 	default:
 		rc = 1;
-- 
2.21.0


^ permalink raw reply related

* [RFC PATCH net-next 05/12] drop_monitor: Add extack support
From: Ido Schimmel @ 2019-07-22 18:31 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, dsahern, roopa, nikolay, jakub.kicinski, toke,
	andy, f.fainelli, andrew, vivien.didelot, mlxsw, Ido Schimmel
In-Reply-To: <20190722183134.14516-1-idosch@idosch.org>

From: Ido Schimmel <idosch@mellanox.com>

Add various extack messages to make drop_monitor more user friendly.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 net/core/drop_monitor.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
index 9080e62245b9..1d463c0d4bc5 100644
--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -241,7 +241,7 @@ static void trace_napi_poll_hit(void *ignore, struct napi_struct *napi,
 	rcu_read_unlock();
 }
 
-static int set_all_monitor_traces(int state)
+static int set_all_monitor_traces(int state, struct netlink_ext_ack *extack)
 {
 	int rc = 0;
 	struct dm_hw_stat_delta *new_stat = NULL;
@@ -250,6 +250,7 @@ static int set_all_monitor_traces(int state)
 	mutex_lock(&net_dm_mutex);
 
 	if (state == trace_state) {
+		NL_SET_ERR_MSG_MOD(extack, "Trace state already set to requested state");
 		rc = -EAGAIN;
 		goto out_unlock;
 	}
@@ -257,6 +258,7 @@ static int set_all_monitor_traces(int state)
 	switch (state) {
 	case TRACE_ON:
 		if (!try_module_get(THIS_MODULE)) {
+			NL_SET_ERR_MSG_MOD(extack, "Failed to take reference on module");
 			rc = -ENODEV;
 			break;
 		}
@@ -303,6 +305,8 @@ static int set_all_monitor_traces(int state)
 static int net_dm_cmd_config(struct sk_buff *skb,
 			struct genl_info *info)
 {
+	NL_SET_ERR_MSG_MOD(info->extack, "Command not supported");
+
 	return -EOPNOTSUPP;
 }
 
@@ -311,9 +315,9 @@ static int net_dm_cmd_trace(struct sk_buff *skb,
 {
 	switch (info->genlhdr->cmd) {
 	case NET_DM_CMD_START:
-		return set_all_monitor_traces(TRACE_ON);
+		return set_all_monitor_traces(TRACE_ON, info->extack);
 	case NET_DM_CMD_STOP:
-		return set_all_monitor_traces(TRACE_OFF);
+		return set_all_monitor_traces(TRACE_OFF, info->extack);
 	}
 
 	return -EOPNOTSUPP;
-- 
2.21.0


^ permalink raw reply related

* [RFC PATCH net-next 06/12] drop_monitor: Use pre_doit / post_doit hooks
From: Ido Schimmel @ 2019-07-22 18:31 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, dsahern, roopa, nikolay, jakub.kicinski, toke,
	andy, f.fainelli, andrew, vivien.didelot, mlxsw, Ido Schimmel
In-Reply-To: <20190722183134.14516-1-idosch@idosch.org>

From: Ido Schimmel <idosch@mellanox.com>

Each operation from user space should be protected by the global drop
monitor mutex. Use the pre_doit / post_doit hooks to take / release the
lock instead of doing it explicitly in each function.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 net/core/drop_monitor.c | 24 +++++++++++++++++-------
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
index 1d463c0d4bc5..099000930736 100644
--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -75,6 +75,20 @@ static int dm_delay = 1;
 static unsigned long dm_hw_check_delta = 2*HZ;
 static LIST_HEAD(hw_stats_list);
 
+static int net_dm_nl_pre_doit(const struct genl_ops *ops,
+			      struct sk_buff *skb, struct genl_info *info)
+{
+	mutex_lock(&net_dm_mutex);
+
+	return 0;
+}
+
+static void net_dm_nl_post_doit(const struct genl_ops *ops,
+				struct sk_buff *skb, struct genl_info *info)
+{
+	mutex_unlock(&net_dm_mutex);
+}
+
 static struct sk_buff *reset_per_cpu_data(struct per_cpu_dm_data *data)
 {
 	size_t al;
@@ -247,12 +261,9 @@ static int set_all_monitor_traces(int state, struct netlink_ext_ack *extack)
 	struct dm_hw_stat_delta *new_stat = NULL;
 	struct dm_hw_stat_delta *temp;
 
-	mutex_lock(&net_dm_mutex);
-
 	if (state == trace_state) {
 		NL_SET_ERR_MSG_MOD(extack, "Trace state already set to requested state");
-		rc = -EAGAIN;
-		goto out_unlock;
+		return -EAGAIN;
 	}
 
 	switch (state) {
@@ -296,9 +307,6 @@ static int set_all_monitor_traces(int state, struct netlink_ext_ack *extack)
 	else
 		rc = -EINPROGRESS;
 
-out_unlock:
-	mutex_unlock(&net_dm_mutex);
-
 	return rc;
 }
 
@@ -384,6 +392,8 @@ static struct genl_family net_drop_monitor_family __ro_after_init = {
 	.hdrsize        = 0,
 	.name           = "NET_DM",
 	.version        = 2,
+	.pre_doit	= net_dm_nl_pre_doit,
+	.post_doit	= net_dm_nl_post_doit,
 	.module		= THIS_MODULE,
 	.ops		= dropmon_ops,
 	.n_ops		= ARRAY_SIZE(dropmon_ops),
-- 
2.21.0


^ permalink raw reply related

* [RFC PATCH net-next 04/12] drop_monitor: Avoid multiple blank lines
From: Ido Schimmel @ 2019-07-22 18:31 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, dsahern, roopa, nikolay, jakub.kicinski, toke,
	andy, f.fainelli, andrew, vivien.didelot, mlxsw, Ido Schimmel
In-Reply-To: <20190722183134.14516-1-idosch@idosch.org>

From: Ido Schimmel <idosch@mellanox.com>

Remove multiple blank lines which are visually annoying and useless.

This suppresses the "Please don't use multiple blank lines" checkpatch
messages.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 net/core/drop_monitor.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
index 35d31b007da4..9080e62245b9 100644
--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -300,7 +300,6 @@ static int set_all_monitor_traces(int state)
 	return rc;
 }
 
-
 static int net_dm_cmd_config(struct sk_buff *skb,
 			struct genl_info *info)
 {
@@ -427,7 +426,6 @@ static int __init init_net_drop_monitor(void)
 		reset_per_cpu_data(data);
 	}
 
-
 	goto out;
 
 out_unreg:
-- 
2.21.0


^ permalink raw reply related

* [RFC PATCH net-next 03/12] drop_monitor: Document scope of spinlock
From: Ido Schimmel @ 2019-07-22 18:31 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, dsahern, roopa, nikolay, jakub.kicinski, toke,
	andy, f.fainelli, andrew, vivien.didelot, mlxsw, Ido Schimmel
In-Reply-To: <20190722183134.14516-1-idosch@idosch.org>

From: Ido Schimmel <idosch@mellanox.com>

While 'per_cpu_dm_data' is a per-CPU variable, its 'skb' and
'send_timer' fields can be accessed concurrently by the CPU sending the
netlink notification to user space from the workqueue and the CPU
tracing kfree_skb(). This spinlock is meant to protect against that.

Document its scope and suppress the checkpatch message "spinlock_t
definition without comment".

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 net/core/drop_monitor.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
index 000ec8b66d1e..35d31b007da4 100644
--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -52,7 +52,7 @@ static int trace_state = TRACE_OFF;
 static DEFINE_MUTEX(net_dm_mutex);
 
 struct per_cpu_dm_data {
-	spinlock_t		lock;
+	spinlock_t		lock;	/* Protects 'skb' and 'send_timer' */
 	struct sk_buff		*skb;
 	struct work_struct	dm_alert_work;
 	struct timer_list	send_timer;
-- 
2.21.0


^ permalink raw reply related

* [RFC PATCH net-next 00/12] drop_monitor: Capture dropped packets and metadata
From: Ido Schimmel @ 2019-07-22 18:31 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, dsahern, roopa, nikolay, jakub.kicinski, toke,
	andy, f.fainelli, andrew, vivien.didelot, mlxsw, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

So far drop monitor supported only one mode of operation in which a
summary of recent packet drops is periodically sent to user space as a
netlink event. The event only includes the drop location (program
counter) and number of drops in the last interval.

While this mode of operation allows one to understand if the system is
dropping packets, it is not sufficient if a more detailed analysis is
required. Both the packet itself and related metadata are missing.

This patchset extends drop monitor with another mode of operation where
the packet - potentially truncated - and metadata (e.g., drop location,
timestamp, netdev) are sent to user space as a netlink event. Thanks to
the extensible nature of netlink, more metadata can be added in the
future.

To avoid performing expensive operations in the context in which
kfree_skb() is called, the dropped skbs are cloned and queued on per-CPU
skb drop list. The list is then processed in process context (using a
workqueue), where the netlink messages are allocated, prepared and
finally sent to user space.

As a follow-up, I plan to integrate drop monitor with devlink and allow
the latter to call into drop monitor to report hardware drops. In the
future, XDP drops can be added as well, thereby making drop monitor the
go-to netlink channel for diagnosing all packet drops.

Example usage with patched dropwatch [1] can be found here [2]. Example
dissection of drop monitor netlink events with patched wireshark [3] can
be found here [4]. I will submit both changes upstream after the kernel
changes are accepted.

Patches #1-#6 are just cleanups with no functional changes intended.

Patches #7-#8 perform small refactoring before the actual changes are
introduced in the last four patches.

[1] https://github.com/idosch/dropwatch/tree/packet-mode
[2] https://gist.github.com/idosch/7391b77da0b16182406189561fdfa1ef#file-gistfile1-txt
[3] https://github.com/idosch/wireshark/tree/drop-monitor
[4] https://gist.github.com/idosch/7391b77da0b16182406189561fdfa1ef#file-gistfile2-txt

Ido Schimmel (12):
  drop_monitor: Use correct error code
  drop_monitor: Rename and document scope of mutex
  drop_monitor: Document scope of spinlock
  drop_monitor: Avoid multiple blank lines
  drop_monitor: Add extack support
  drop_monitor: Use pre_doit / post_doit hooks
  drop_monitor: Split tracing enable / disable to different functions
  drop_monitor: Initialize timer and work item upon tracing enable
  drop_monitor: Require CAP_NET_ADMIN for drop monitor configuration
  drop_monitor: Add packet alert mode
  drop_monitor: Allow truncation of dropped packets
  drop_monitor: Add a command to query current configuration

 include/uapi/linux/net_dropmon.h |  33 +++
 net/core/drop_monitor.c          | 492 ++++++++++++++++++++++++++++---
 2 files changed, 478 insertions(+), 47 deletions(-)

-- 
2.21.0


^ permalink raw reply

* [RFC PATCH net-next 02/12] drop_monitor: Rename and document scope of mutex
From: Ido Schimmel @ 2019-07-22 18:31 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, dsahern, roopa, nikolay, jakub.kicinski, toke,
	andy, f.fainelli, andrew, vivien.didelot, mlxsw, Ido Schimmel
In-Reply-To: <20190722183134.14516-1-idosch@idosch.org>

From: Ido Schimmel <idosch@mellanox.com>

The 'trace_state_mutex' does not only protect the global 'trace_state'
variable, but also the global 'hw_stats_list'.

Subsequent patches are going add more operations from user space to
drop_monitor and these all need to be mutually exclusive.

Rename 'trace_state_mutex' to the more fitting 'net_dm_mutex' name and
document its scope.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 net/core/drop_monitor.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
index dcb4d2aeb2a8..000ec8b66d1e 100644
--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -43,7 +43,13 @@
  * netlink alerts
  */
 static int trace_state = TRACE_OFF;
-static DEFINE_MUTEX(trace_state_mutex);
+
+/* net_dm_mutex
+ *
+ * An overall lock guarding every operation coming from userspace.
+ * It also guards the global 'hw_stats_list' list.
+ */
+static DEFINE_MUTEX(net_dm_mutex);
 
 struct per_cpu_dm_data {
 	spinlock_t		lock;
@@ -241,7 +247,7 @@ static int set_all_monitor_traces(int state)
 	struct dm_hw_stat_delta *new_stat = NULL;
 	struct dm_hw_stat_delta *temp;
 
-	mutex_lock(&trace_state_mutex);
+	mutex_lock(&net_dm_mutex);
 
 	if (state == trace_state) {
 		rc = -EAGAIN;
@@ -289,7 +295,7 @@ static int set_all_monitor_traces(int state)
 		rc = -EINPROGRESS;
 
 out_unlock:
-	mutex_unlock(&trace_state_mutex);
+	mutex_unlock(&net_dm_mutex);
 
 	return rc;
 }
@@ -330,12 +336,12 @@ static int dropmon_net_event(struct notifier_block *ev_block,
 
 		new_stat->dev = dev;
 		new_stat->last_rx = jiffies;
-		mutex_lock(&trace_state_mutex);
+		mutex_lock(&net_dm_mutex);
 		list_add_rcu(&new_stat->list, &hw_stats_list);
-		mutex_unlock(&trace_state_mutex);
+		mutex_unlock(&net_dm_mutex);
 		break;
 	case NETDEV_UNREGISTER:
-		mutex_lock(&trace_state_mutex);
+		mutex_lock(&net_dm_mutex);
 		list_for_each_entry_safe(new_stat, tmp, &hw_stats_list, list) {
 			if (new_stat->dev == dev) {
 				new_stat->dev = NULL;
@@ -346,7 +352,7 @@ static int dropmon_net_event(struct notifier_block *ev_block,
 				}
 			}
 		}
-		mutex_unlock(&trace_state_mutex);
+		mutex_unlock(&net_dm_mutex);
 		break;
 	}
 out:
-- 
2.21.0


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox