Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next 11/18] net: dsa: mv88e6xxx: get STU entry on VTU GetNext
From: Andrew Lunn @ 2017-04-27 20:17 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli
In-Reply-To: <20170426155336.5937-12-vivien.didelot@savoirfairelinux.com>

On Wed, Apr 26, 2017 at 11:53:29AM -0400, Vivien Didelot wrote:
> Now that the code reads both VTU and STU data on VTU GetNext operation,
> fetch the STU entry data of a VTU entry at the same time.
> 
> The STU data bits are masked with the VTU data bits and they are now all
> read at the same time a VTU GetNext operation is issued.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH v2 01/21] scatterlist: Introduce sg_map helper functions
From: Logan Gunthorpe @ 2017-04-27 20:13 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	target-devel-u79uwXL29TY76Z2rM5mHXA, Sumit Semwal,
	devel-gWbeCf7V1WCQmaza687I9mD2FQJk+8+b, James E.J. Bottomley,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	open-iscsi-/JYPxA39Uh5TLH3MbocFFw,
	linux-media-u79uwXL29TY76Z2rM5mHXA,
	intel-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	sparmaintainer-GLv8BlqOqDDQT0dZR+AlfA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	megaraidlinux.pdl-dY08KVG/lbpWk0Htik3J/w, Jens Axboe,
	Martin K. Petersen, netdev-u79uwXL29TY76Z2rM5mHXA, Matthew Wilcox,
	linux-mmc-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-crypto-u79uwXL29TY76Z2rM5mHXA, Greg Kroah-Hartman
In-Reply-To: <20170426074416.GA7936-jcswGhMUV9g@public.gmane.org>


On 26/04/17 01:44 AM, Christoph Hellwig wrote:
> I think we'll at least need a draft of those to make sense of these
> patches.  Otherwise they just look very clumsy.

Ok, what follows is a draft patch attempting to show where I'm thinking
of going with this. Obviously it will not compile because it assumes
the users throughout the kernel are a bit different than they are today.
Notably, there is no sg_page anymore.

There's also likely a ton of issues and arguments to have over a bunch
of the specifics below and I'd expect the concept to evolve more
as cleanup occurs. This itself is an evolution of the draft I posted
replying to you in my last RFC thread.

Also, before any of this is truly useful to us, pfn_t would have to
infect a few other places in the kernel.

Thanks,

Logan


diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index fad170b..85ef928 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -6,13 +6,14 @@
 #include <linux/bug.h>
 #include <linux/mm.h>
 #include <linux/highmem.h>
+#include <linux/pfn_t.h>
 #include <asm/io.h>

 struct scatterlist {
 #ifdef CONFIG_DEBUG_SG
 	unsigned long	sg_magic;
 #endif
-	unsigned long	page_link;
+	pfn_t  		pfn;
 	unsigned int	offset;
 	unsigned int	length;
 	dma_addr_t	dma_address;
@@ -60,15 +61,68 @@ struct sg_table {

 #define SG_MAGIC	0x87654321

-/*
- * We overload the LSB of the page pointer to indicate whether it's
- * a valid sg entry, or whether it points to the start of a new
scatterlist.
- * Those low bits are there for everyone! (thanks mason :-)
- */
-#define sg_is_chain(sg)		((sg)->page_link & 0x01)
-#define sg_is_last(sg)		((sg)->page_link & 0x02)
-#define sg_chain_ptr(sg)	\
-	((struct scatterlist *) ((sg)->page_link & ~0x03))
+static inline bool sg_is_chain(struct scatterlist *sg)
+{
+	return sg->pfn.val & PFN_SG_CHAIN;
+}
+
+static inline bool sg_is_last(struct scatterlist *sg)
+{
+	return sg->pfn.val & PFN_SG_LAST;
+}
+
+static inline struct scatterlist *sg_chain_ptr(struct scatterlist *sg)
+{
+	unsigned long sgl = pfn_t_to_pfn(sg->pfn);
+	return (struct scatterlist *)(sgl << PAGE_SHIFT);
+}
+
+static inline bool sg_is_iomem(struct scatterlist *sg)
+{
+	return pfn_t_is_iomem(sg->pfn);
+}
+
+/**
+ * sg_assign_pfn - Assign a given pfn_t to an SG entry
+ * @sg:		    SG entry
+ * @pfn:	    The pfn
+ *
+ * Description:
+ *   Assign a pfn to sg entry. Also see sg_set_pfn(), the most commonly
used
+ *   variant.w
+ *
+ **/
+static inline void sg_assign_pfn(struct scatterlist *sg, pfn_t pfn)
+{
+#ifdef CONFIG_DEBUG_SG
+	BUG_ON(sg->sg_magic != SG_MAGIC);
+	BUG_ON(sg_is_chain(sg));
+	BUG_ON(pfn.val & (PFN_SG_CHAIN | PFN_SG_LAST));
+#endif
+
+	sg->pfn = pfn;
+}
+
+/**
+ * sg_set_pfn - Set sg entry to point at given pfn
+ * @sg:		 SG entry
+ * @pfn:	 The page
+ * @len:	 Length of data
+ * @offset:	 Offset into page
+ *
+ * Description:
+ *   Use this function to set an sg entry pointing at a pfn, never assign
+ *   the page directly. We encode sg table information in the lower bits
+ *   of the page pointer. See sg_pfn_t for looking up the pfn_t belonging
+ *   to an sg entry.
+ **/
+static inline void sg_set_pfn(struct scatterlist *sg, pfn_t pfn,
+			      unsigned int len, unsigned int offset)
+{
+	sg_assign_pfn(sg, pfn);
+	sg->offset = offset;
+	sg->length = len;
+}

 /**
  * sg_assign_page - Assign a given page to an SG entry
@@ -82,18 +136,13 @@ struct sg_table {
  **/
 static inline void sg_assign_page(struct scatterlist *sg, struct page
*page)
 {
-	unsigned long page_link = sg->page_link & 0x3;
+	if (!page) {
+		pfn_t null_pfn = {0};
+		sg_assign_pfn(sg, null_pfn);
+		return;
+	}

-	/*
-	 * In order for the low bit stealing approach to work, pages
-	 * must be aligned at a 32-bit boundary as a minimum.
-	 */
-	BUG_ON((unsigned long) page & 0x03);
-#ifdef CONFIG_DEBUG_SG
-	BUG_ON(sg->sg_magic != SG_MAGIC);
-	BUG_ON(sg_is_chain(sg));
-#endif
-	sg->page_link = page_link | (unsigned long) page;
+	sg_assign_pfn(sg, page_to_pfn_t(page));
 }

 /**
@@ -106,8 +155,7 @@ static inline void sg_assign_page(struct scatterlist
*sg, struct page *page)
  * Description:
  *   Use this function to set an sg entry pointing at a page, never assign
  *   the page directly. We encode sg table information in the lower bits
- *   of the page pointer. See sg_page() for looking up the page belonging
- *   to an sg entry.
+ *   of the page pointer.
  *
  **/
 static inline void sg_set_page(struct scatterlist *sg, struct page *page,
@@ -118,13 +166,53 @@ static inline void sg_set_page(struct scatterlist
*sg, struct page *page,
 	sg->length = len;
 }

-static inline struct page *sg_page(struct scatterlist *sg)
+/**
+ * sg_pfn_t - Return the pfn_t for the sg
+ * @sg:		 SG entry
+ *
+ **/
+static inline pfn_t sg_pfn_t(struct scatterlist *sg)
 {
 #ifdef CONFIG_DEBUG_SG
 	BUG_ON(sg->sg_magic != SG_MAGIC);
 	BUG_ON(sg_is_chain(sg));
 #endif
-	return (struct page *)((sg)->page_link & ~0x3);
+
+	return sg->pfn;
+}
+
+/**
+ * sg_to_mappable_page - Try to return a struct page safe for general
+ *	use in the kernel
+ * @sg:		 SG entry
+ * @page:	 A pointer to the returned page
+ *
+ * Description:
+ *   If possible, return a mappable page that's safe for use around the
+ *   kernel. Should only be used in legacy situations. sg_pfn_t() is a
+ *   better choice for new code. This is deliberately more awkward than
+ *   the old sg_page to enforce the __must_check rule and discourage future
+ *   use.
+ *
+ *   An example where this is required is in nvme-fabrics: a page from an
+ *   sgl is placed into a bio. This function would be required until we can
+ *   convert bios to use pfn_t as well. Similar issues with skbs, etc.
+ **/
+static inline __must_check int sg_to_mappable_page(struct scatterlist *sg,
+						   struct page **ret)
+{
+	struct page *pg;
+
+	if (unlikely(sg_is_iomem(sg)))
+		return -EFAULT;
+
+	pg = pfn_t_to_page(sg->pfn);
+	if (unlikely(!pg))
+		return -EFAULT;
+
+	*ret = pg;
+
+	return 0;
 }

 #define SG_KMAP		     (1 << 0)	/* create a mapping with kmap */
@@ -167,8 +255,19 @@ static inline void *sg_map(struct scatterlist *sg,
size_t offset, int flags)
 	unsigned int pg_off;
 	void *ret;

+	if (unlikely(sg_is_iomem(sg))) {
+		ret = ERR_PTR(-EFAULT);
+		goto out;
+	}
+
+	pg = pfn_t_to_page(sg->pfn);
+	if (unlikely(!pg)) {
+		ret = ERR_PTR(-EFAULT);
+		goto out;
+	}
+
 	offset += sg->offset;
-	pg = nth_page(sg_page(sg), offset >> PAGE_SHIFT);
+	pg = nth_page(pg, offset >> PAGE_SHIFT);
 	pg_off = offset_in_page(offset);

 	if (flags & SG_KMAP_ATOMIC)
@@ -178,12 +277,7 @@ static inline void *sg_map(struct scatterlist *sg,
size_t offset, int flags)
 	else
 		ret = ERR_PTR(-EINVAL);

-	/*
-	 * In theory, this can't happen yet. Once we start adding
-	 * unmapable memory, it also shouldn't happen unless developers
-	 * start putting unmappable struct pages in sgls and passing
-	 * it to code that doesn't support it.
-	 */
+out:
 	BUG_ON(flags & SG_MAP_MUST_NOT_FAIL && IS_ERR(ret));

 	return ret;
@@ -202,9 +296,15 @@ static inline void *sg_map(struct scatterlist *sg,
size_t offset, int flags)
 static inline void sg_unmap(struct scatterlist *sg, void *addr,
 			    size_t offset, int flags)
 {
-	struct page *pg = nth_page(sg_page(sg), offset >> PAGE_SHIFT);
+	struct page *pg;
 	unsigned int pg_off = offset_in_page(offset);

+	pg = pfn_t_to_page(sg->pfn);
+	if (unlikely(!pg))
+		return;
+
+	pg = nth_page(pg, offset >> PAGE_SHIFT);
+
 	if (flags & SG_KMAP_ATOMIC)
 		kunmap_atomic(addr - sg->offset - pg_off);
 	else if (flags & SG_KMAP)
@@ -246,17 +346,18 @@ static inline void sg_set_buf(struct scatterlist
*sg, const void *buf,
 static inline void sg_chain(struct scatterlist *prv, unsigned int
prv_nents,
 			    struct scatterlist *sgl)
 {
+	pfn_t pfn;
+	unsigned long _sgl = (unsigned long) sgl;
+
 	/*
 	 * offset and length are unused for chain entry.  Clear them.
 	 */
 	prv[prv_nents - 1].offset = 0;
 	prv[prv_nents - 1].length = 0;

-	/*
-	 * Set lowest bit to indicate a link pointer, and make sure to clear
-	 * the termination bit if it happens to be set.
-	 */
-	prv[prv_nents - 1].page_link = ((unsigned long) sgl | 0x01) & ~0x02;
+	BUG_ON(_sgl & PAGE_MASK);
+	pfn = __pfn_to_pfn_t(_sgl >> PAGE_SHIFT, PFN_SG_CHAIN);
+	prv[prv_nents - 1].pfn = pfn;
 }

 /**
@@ -276,8 +377,8 @@ static inline void sg_mark_end(struct scatterlist *sg)
 	/*
 	 * Set termination bit, clear potential chain bit
 	 */
-	sg->page_link |= 0x02;
-	sg->page_link &= ~0x01;
+	sg->pfn.val |= PFN_SG_LAST;
+	sg->pfn.val &= ~PFN_SG_CHAIN;
 }

 /**
@@ -293,7 +394,7 @@ static inline void sg_unmark_end(struct scatterlist *sg)
 #ifdef CONFIG_DEBUG_SG
 	BUG_ON(sg->sg_magic != SG_MAGIC);
 #endif
-	sg->page_link &= ~0x02;
+	sg->pfn.val &= ~PFN_SG_LAST;
 }

 /**
@@ -301,14 +402,13 @@ static inline void sg_unmark_end(struct
scatterlist *sg)
  * @sg:	     SG entry
  *
  * Description:
- *   This calls page_to_phys() on the page in this sg entry, and adds the
- *   sg offset. The caller must know that it is legal to call
page_to_phys()
- *   on the sg page.
+ *   This calls pfn_t_to_phys() on the pfn in this sg entry, and adds the
+ *   sg offset.
  *
  **/
 static inline dma_addr_t sg_phys(struct scatterlist *sg)
 {
-	return page_to_phys(sg_page(sg)) + sg->offset;
+	return pfn_t_to_phys(sg->pfn) + sg->offset;
 }

 /**
@@ -323,7 +423,12 @@ static inline dma_addr_t sg_phys(struct scatterlist
*sg)
  **/
 static inline void *sg_virt(struct scatterlist *sg)
 {
-	return page_address(sg_page(sg)) + sg->offset;
+	struct page *pg = pfn_t_to_page(sg->pfn);
+
+	BUG_ON(sg_is_iomem(sg));
+	BUG_ON(!pg);
+
+	return page_address(pg) + sg->offset;
 }

 int sg_nents(struct scatterlist *sg);
@@ -422,10 +527,18 @@ void __sg_page_iter_start(struct sg_page_iter *piter,
 /**
  * sg_page_iter_page - get the current page held by the page iterator
  * @piter:	page iterator holding the page
+ *
+ * This function will require some cleanup. Some users simply mark
+ * attributes of the pages which are fine, others actually map it and
+ * will require some saftey there.
  */
 static inline struct page *sg_page_iter_page(struct sg_page_iter *piter)
 {
-	return nth_page(sg_page(piter->sg), piter->sg_pgoffset);
+	struct page *pg = pfn_t_to_page(piter->sg->pfn);
+	if (!pg)
+		return NULL;
+
+	return nth_page(pg, piter->sg_pgoffset);
 }

 /**
@@ -468,11 +581,13 @@ static inline dma_addr_t
sg_page_iter_dma_address(struct sg_page_iter *piter)
 #define SG_MITER_ATOMIC		(1 << 0)	 /* use kmap_atomic */
 #define SG_MITER_TO_SG		(1 << 1)	/* flush back to phys on unmap */
 #define SG_MITER_FROM_SG	(1 << 2)	/* nop */
+#define SG_MITER_SUPPORTS_IOMEM (1 << 3)        /* iteratee supports
iomem */

 struct sg_mapping_iter {
 	/* the following three fields can be accessed directly */
 	struct page		*page;		/* currently mapped page */
 	void			*addr;		/* pointer to the mapped area */
+	void __iomem            *ioaddr;        /* pointer to iomem */
 	size_t			length;		/* length of the mapped area */
 	size_t			consumed;	/* number of consumed bytes */
 	struct sg_page_iter	piter;		/* page iterator */
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index c6cf822..2d1c58c 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -571,6 +571,8 @@ EXPORT_SYMBOL(sg_miter_skip);
  */
 bool sg_miter_next(struct sg_mapping_iter *miter)
 {
+	void *addr;
+
 	sg_miter_stop(miter);

 	/*
@@ -580,13 +582,25 @@ bool sg_miter_next(struct sg_mapping_iter *miter)
 	if (!sg_miter_get_next_page(miter))
 		return false;

+	if (sg_is_iomem(miter->piter.sg) &&
+	    !(miter->__flags & SG_MITER_SUPPORTS_IOMEM))
+		return false;
+
 	miter->page = sg_page_iter_page(&miter->piter);
 	miter->consumed = miter->length = miter->__remaining;

 	if (miter->__flags & SG_MITER_ATOMIC)
-		miter->addr = kmap_atomic(miter->page) + miter->__offset;
+		addr = kmap_atomic(miter->page) + miter->__offset;
 	else
-		miter->addr = kmap(miter->page) + miter->__offset;
+		addr = kmap(miter->page) + miter->__offset;
+
+	if (sg_is_iomem(miter->piter.sg)) {
+		miter->addr = NULL;
+		miter->ioaddr = (void * __iomem) addr;
+	} else {
+		miter->addr = addr;
+		miter->ioaddr = NULL;
+	}

 	return true;
 }
@@ -651,7 +665,7 @@ size_t sg_copy_buffer(struct scatterlist *sgl,
unsigned int nents, void *buf,
 {
 	unsigned int offset = 0;
 	struct sg_mapping_iter miter;
-	unsigned int sg_flags = SG_MITER_ATOMIC;
+	unsigned int sg_flags = SG_MITER_ATOMIC | SG_MITER_SUPPORTS_IOMEM;

 	if (to_buffer)
 		sg_flags |= SG_MITER_FROM_SG;
@@ -668,10 +682,17 @@ size_t sg_copy_buffer(struct scatterlist *sgl,
unsigned int nents, void *buf,

 		len = min(miter.length, buflen - offset);

-		if (to_buffer)
-			memcpy(buf + offset, miter.addr, len);
-		else
-			memcpy(miter.addr, buf + offset, len);
+		if (miter.addr) {
+			if (to_buffer)
+				memcpy(buf + offset, miter.addr, len);
+			else
+				memcpy(miter.addr, buf + offset, len);
+		} else if (miter.ioaddr) {
+			if (to_buffer)
+				memcpy_fromio(buf + offset, miter.addr, len);
+			else
+				memcpy_toio(miter.addr, buf + offset, len);
+		}

 		offset += len;
 	}

^ permalink raw reply related

* Re: [PATCH net-next V3 2/2] rtnl: Add support for netdev event attribute to link messages
From: David Ahern @ 2017-04-27 19:59 UTC (permalink / raw)
  To: vyasevic, Vladislav Yasevich, netdev; +Cc: roopa, Jiri Pirko
In-Reply-To: <7b5396ae-0cf3-2a1e-9c49-5d6f031adf58@redhat.com>

On 4/27/17 1:43 PM, Vlad Yasevich wrote:
>> For example, NETDEV_CHANGEINFODATA is only for bonds though nothing
>> about the name suggests it is a bonding notification. This one was added
>> specifically to notify userspace (d4261e5650004), yet seems to happen
>> only during a changelink and that already generates a RTM_NEWLINK
>> message via do_setlink. Since the rtnetlink_event message does not
>> contain anything "NETDEV_CHANGEINFODATA" related what purpose does it
>> really serve besides duplicating netlink messages to userspace.
>>
> 
> I am not sure about this one, but if you have an app trying to monitor
> for this event, it can't really since there is no info in the netlink message.

I cc'ed Jiri on this thread hoping he would explain the intent.

I propose it gets removed.

^ permalink raw reply

* Re: [PATCH net-next V3 2/2] rtnl: Add support for netdev event attribute to link messages
From: Vlad Yasevich @ 2017-04-27 19:51 UTC (permalink / raw)
  To: Roopa Prabhu, David Ahern
  Cc: Vladislav Yasevich, netdev@vger.kernel.org, Jiri Pirko
In-Reply-To: <CAJieiUi6Uu-=QBMfBXOjrEDXLCZx=mFo9SyLhgb+CcG2=uiyRA@mail.gmail.com>

On 04/24/2017 11:14 AM, Roopa Prabhu wrote:
> On Sun, Apr 23, 2017 at 6:07 PM, David Ahern <dsa@cumulusnetworks.com> wrote:
>>
>> On 4/21/17 11:31 AM, Vladislav Yasevich wrote:
>>> @@ -1276,9 +1277,40 @@ static int rtnl_xdp_fill(struct sk_buff *skb, struct net_device *dev)
>>>       return err;
>>>  }
>>>
>>> +static int rtnl_fill_link_event(struct sk_buff *skb, unsigned long event)
>>> +{
>>> +     u32 rtnl_event;
>>> +
>>> +     switch (event) {
>>> +     case NETDEV_REBOOT:
>>> +             rtnl_event = IFLA_EVENT_REBOOT;
>>> +             break;
>>> +     case NETDEV_FEAT_CHANGE:
>>> +             rtnl_event = IFLA_EVENT_FEAT_CHANGE;
>>> +             break;
>>> +     case NETDEV_BONDING_FAILOVER:
>>> +             rtnl_event = IFLA_EVENT_BONDING_FAILOVER;
>>> +             break;
>>> +     case NETDEV_NOTIFY_PEERS:
>>> +             rtnl_event = IFLA_EVENT_NOTIFY_PEERS;
>>> +             break;
>>> +     case NETDEV_RESEND_IGMP:
>>> +             rtnl_event = IFLA_EVENT_RESEND_IGMP;
>>> +             break;
>>> +     case NETDEV_CHANGEINFODATA:
>>> +             rtnl_event = IFLA_EVENT_CHANGE_INFO_DATA;
>>> +             break;
>>> +     default:
>>> +             return 0;
>>> +     }
>>> +
>>> +     return nla_put_u32(skb, IFLA_EVENT, rtnl_event);
>>> +}
>>> +
>>
>> I still have doubts about encoding kernel events into a uapi.
> 
> agree. I don't see why user-space will need NETDEV_CHANGEINFODATA and
> others david listed.
> 

Well, I am not sure about CHANGEINFODATA as well, but I can see use
cases for others.

> My other concerns are, once we have this exposed to user-space and
> user-space starts relying on it, it will need accurate information and
> will expect to have this event information all the time.
> IIUC, we cannot cover multiple events in a single notification and not
> all link notifications will contain an IFLA_EVENT attribute.

Uhm...  If the rtnetlink message was a result of an event, it will have
an IFLA_EVENT.  If a message is something else, then it will not have
an event.  That's the point.  Not all netlink attributes are in every
netlink message.

> In other
> words, we will be telling user-space to not expect that the kernel
> will send IFLA_EVENT every time.
> 

No, we are telling the user that if it is interested in a specific event
(let's say NOTIFY_PEERS or RESEND_IGMP), then it now can monitor netlink
traffic for those events.
As things stand right now, that's not possible.

I've done this specifically for all events for which we currently generate
a netlink message.

The only concern I have is that if in the future we remove a certain netdev
event, it may impact applications.  But we may be doing it right now as well,
only silently, and the apps may have to find some ways to work around it.

There is also a potential to improve libnl caching and not invalidate the
cached data for certain events.

-vlad
> 
> 
>>
>> For example, NETDEV_CHANGEINFODATA is only for bonds though nothing
>> about the name suggests it is a bonding notification. This one was added
>> specifically to notify userspace (d4261e5650004), yet seems to happen
>> only during a changelink and that already generates a RTM_NEWLINK
>> message via do_setlink. Since the rtnetlink_event message does not
>> contain anything "NETDEV_CHANGEINFODATA" related what purpose does it
>> really serve besides duplicating netlink messages to userspace.
>>
>> The REBOOT, IGMP, FEAT_CHANGE and BONDING_FAILOVER seem to be unique
>> messages (code analysis only) which I get for notifying userspace.
>>
>> NETDEV_NOTIFY_PEERS is not so clear in how often it duplicates other
>> messages.

^ permalink raw reply

* Re: [PATCH net-next V3 2/2] rtnl: Add support for netdev event attribute to link messages
From: Vlad Yasevich @ 2017-04-27 19:43 UTC (permalink / raw)
  To: David Ahern, Vladislav Yasevich, netdev; +Cc: roopa, Jiri Pirko
In-Reply-To: <877efb54-2aef-4d1e-c0b4-2ce6aa6562df@cumulusnetworks.com>

On 04/23/2017 09:07 PM, David Ahern wrote:
> On 4/21/17 11:31 AM, Vladislav Yasevich wrote:
>> @@ -1276,9 +1277,40 @@ static int rtnl_xdp_fill(struct sk_buff *skb, struct net_device *dev)
>>  	return err;
>>  }
>>  
>> +static int rtnl_fill_link_event(struct sk_buff *skb, unsigned long event)
>> +{
>> +	u32 rtnl_event;
>> +
>> +	switch (event) {
>> +	case NETDEV_REBOOT:
>> +		rtnl_event = IFLA_EVENT_REBOOT;
>> +		break;
>> +	case NETDEV_FEAT_CHANGE:
>> +		rtnl_event = IFLA_EVENT_FEAT_CHANGE;
>> +		break;
>> +	case NETDEV_BONDING_FAILOVER:
>> +		rtnl_event = IFLA_EVENT_BONDING_FAILOVER;
>> +		break;
>> +	case NETDEV_NOTIFY_PEERS:
>> +		rtnl_event = IFLA_EVENT_NOTIFY_PEERS;
>> +		break;
>> +	case NETDEV_RESEND_IGMP:
>> +		rtnl_event = IFLA_EVENT_RESEND_IGMP;
>> +		break;
>> +	case NETDEV_CHANGEINFODATA:
>> +		rtnl_event = IFLA_EVENT_CHANGE_INFO_DATA;
>> +		break;
>> +	default:
>> +		return 0;
>> +	}
>> +
>> +	return nla_put_u32(skb, IFLA_EVENT, rtnl_event);
>> +}
>> +
> 
> I still have doubts about encoding kernel events into a uapi.
> 
> For example, NETDEV_CHANGEINFODATA is only for bonds though nothing
> about the name suggests it is a bonding notification. This one was added
> specifically to notify userspace (d4261e5650004), yet seems to happen
> only during a changelink and that already generates a RTM_NEWLINK
> message via do_setlink. Since the rtnetlink_event message does not
> contain anything "NETDEV_CHANGEINFODATA" related what purpose does it
> really serve besides duplicating netlink messages to userspace.
> 

I am not sure about this one, but if you have an app trying to monitor
for this event, it can't really since there is no info in the netlink message.

> The REBOOT, IGMP, FEAT_CHANGE and BONDING_FAILOVER seem to be unique
> messages (code analysis only) which I get for notifying userspace.
> 
> NETDEV_NOTIFY_PEERS is not so clear in how often it duplicates other
> messages.
> 

This one sometimes happens in addition to bonding failover, but not always
(it depends on bonding mode).
For me, having access to this particular event is important as it will
used to trigger a guest announcements.

-vlad

^ permalink raw reply

* Re: [PATCH net-next 10/18] net: dsa: mv88e6xxx: move STU GetNext operation
From: Andrew Lunn @ 2017-04-27 19:43 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli
In-Reply-To: <20170426155336.5937-11-vivien.didelot@savoirfairelinux.com>

On Wed, Apr 26, 2017 at 11:53:28AM -0400, Vivien Didelot wrote:
> Extract the generic portion of code to issue an STU GetNext operation,
> which will be used in other implementations.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* [PATCH net 2/2] vxlan: do not output confusing error message
From: Jiri Benc @ 2017-04-27 19:24 UTC (permalink / raw)
  To: netdev; +Cc: Marcelo Ricardo Leitner
In-Reply-To: <cover.1493320999.git.jbenc@redhat.com>

The message "Cannot bind port X, err=Y" creates only confusion. In metadata
based mode, failure of IPv6 socket creation is okay if IPv6 is disabled and
no error message should be printed. But when IPv6 tunnel was requested, such
failure is fatal. The vxlan_socket_create does not know when the error is
harmless and when it's not.

Instead of passing such information down to vxlan_socket_create, remove the
message completely. It's not useful. We propagate the error code up to the
user space and the port number comes from the user space. There's nothing in
the message that the process creating vxlan interface does not know.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
---
 drivers/net/vxlan.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 118e508f1889..27cba699d83a 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2754,8 +2754,6 @@ static struct vxlan_sock *vxlan_socket_create(struct net *net, bool ipv6,
 
 	sock = vxlan_create_sock(net, ipv6, port, flags);
 	if (IS_ERR(sock)) {
-		pr_info("Cannot bind port %d, err=%ld\n", ntohs(port),
-			PTR_ERR(sock));
 		kfree(vs);
 		return ERR_CAST(sock);
 	}
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net 1/2] vxlan: correctly handle ipv6.disable module parameter
From: Jiri Benc @ 2017-04-27 19:24 UTC (permalink / raw)
  To: netdev; +Cc: Marcelo Ricardo Leitner
In-Reply-To: <cover.1493320999.git.jbenc@redhat.com>

When IPv6 is compiled but disabled at runtime, __vxlan_sock_add returns
-EAFNOSUPPORT. For metadata based tunnels, this causes failure of the whole
operation of bringing up the tunnel.

Ignore failure of IPv6 socket creation for metadata based tunnels caused by
IPv6 not being available.

Fixes: b1be00a6c39f ("vxlan: support both IPv4 and IPv6 sockets in a single vxlan device")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
---
 drivers/net/vxlan.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index bdb6ae16d4a8..118e508f1889 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2818,17 +2818,21 @@ static int __vxlan_sock_add(struct vxlan_dev *vxlan, bool ipv6)
 
 static int vxlan_sock_add(struct vxlan_dev *vxlan)
 {
-	bool ipv6 = vxlan->flags & VXLAN_F_IPV6;
 	bool metadata = vxlan->flags & VXLAN_F_COLLECT_METADATA;
+	bool ipv6 = vxlan->flags & VXLAN_F_IPV6 || metadata;
+	bool ipv4 = !ipv6 || metadata;
 	int ret = 0;
 
 	RCU_INIT_POINTER(vxlan->vn4_sock, NULL);
 #if IS_ENABLED(CONFIG_IPV6)
 	RCU_INIT_POINTER(vxlan->vn6_sock, NULL);
-	if (ipv6 || metadata)
+	if (ipv6) {
 		ret = __vxlan_sock_add(vxlan, true);
+		if (ret < 0 && ret != -EAFNOSUPPORT)
+			ipv4 = false;
+	}
 #endif
-	if (!ret && (!ipv6 || metadata))
+	if (ipv4)
 		ret = __vxlan_sock_add(vxlan, false);
 	if (ret < 0)
 		vxlan_sock_release(vxlan);
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net 0/2] vxlan: do not error out on disabled IPv6
From: Jiri Benc @ 2017-04-27 19:24 UTC (permalink / raw)
  To: netdev; +Cc: Marcelo Ricardo Leitner

This patchset fixes a bug with metadata based tunnels when booted with
ipv6.disable=1.


Jiri Benc (2):
  vxlan: correctly handle ipv6.disable module parameter
  vxlan: do not output confusing error message

 drivers/net/vxlan.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

* Re: [PATCH net-next 1/2] rtnetlink: Disable notification for NETDEV_NAMECHANGE event
From: Vlad Yasevich @ 2017-04-27 19:26 UTC (permalink / raw)
  To: David Ahern, Vladislav Yasevich, netdev
In-Reply-To: <becfb3d4-5e9d-800b-fc67-1e99f4b14db9@cumulusnetworks.com>

On 04/21/2017 02:08 PM, David Ahern wrote:
> On 4/21/17 11:31 AM, Vladislav Yasevich wrote:
>> The data signaling name change is already provided at
>> the end of do_setlink().  This event handler just generates
>> a duplicate announcement.  Disable it.
>>
>> CC: David Ahern <dsa@cumulusnetworks.com>
>> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
>> ---
>>  net/core/rtnetlink.c | 1 -
>>  1 file changed, 1 deletion(-)
>>
>> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
>> index 0ee5479..e8e6816 100644
>> --- a/net/core/rtnetlink.c
>> +++ b/net/core/rtnetlink.c
>> @@ -4123,7 +4123,6 @@ static int rtnetlink_event(struct notifier_block *this, unsigned long event, voi
>>  
>>  	switch (event) {
>>  	case NETDEV_REBOOT:
>> -	case NETDEV_CHANGENAME:
>>  	case NETDEV_FEAT_CHANGE:
>>  	case NETDEV_BONDING_FAILOVER:
>>  	case NETDEV_NOTIFY_PEERS:
>>
> 
> 
> I only see one using the ip monitor.
> 
> $ ip li set foobar name fubar
> 
> generates these 3 messages:
> 
> [LINK]12: fubar: <BROADCAST,NOARP> mtu 1500 qdisc noqueue state DOWN
> group default
>     link/ether 76:cd:72:dd:2a:cb brd ff:ff:ff:ff:ff:ff
> Unknown message: type=0x00000051(81) flags=0x00000000(0)len=0x0000001c(28)
> [NETCONF]ipv4 dev dummy2 forwarding on rp_filter off mc_forwarding off
> proxy_neigh off ignore_routes_with_linkdown off
> Unknown message: type=0x00000051(81) flags=0x00000000(0)len=0x0000001c(28)
> [NETCONF]ipv6 dev dummy2 forwarding on mc_forwarding off proxy_neigh off
> ignore_routes_with_linkdown off
> 
> do_setlink only sets DO_SETLINK_MODIFIED so a name change alone will not
> generate 2 messages.
> 

Actually, it has nothing to do with above flag.  Setting DO_SETLINK_MODIFIED
will still generate notifications, but only if the device is UP.  However,
it looks like link name change can only be done when the link is down.  As
a result, netdev_state_change will not report it, so we only see the 'event'
one.

So this is patch isn't needed, but only as a kind-of side-effect..

-vlad

^ permalink raw reply

* Re: [PATCH net-next 0/9] support unique MAC addresses for slave devices
From: Mahesh Bandewar (महेश बंडेवार) @ 2017-04-27 19:21 UTC (permalink / raw)
  To: Marco Chiappero
  Cc: linux-netdev, David S . Miller, Jeff Kirsher, Alexander Duyck,
	Sainath Grandhi
In-Reply-To: <20170427145142.15830-1-marco.chiappero@intel.com>

On Thu, Apr 27, 2017 at 7:51 AM, Marco Chiappero
<marco.chiappero@intel.com> wrote:
> Currently every slave device gets assigned the same MAC address, by
> having it copied from the master interface. Since some code paths
> depend on this identity, changing the MAC address on slave interfaces
> is not supported. However identical MAC addresses can pose problems to
> management and orchestration software that correctly expect network
> interfaces on the same segment to have unique addresses.
>
Please understand that there are two distinct drivers IPvlan and
MACvlan. They both exist together for good reasons and are trying to
cater for different needs. I would love to combine them together if we
don't mess / miss the goodies each of them have to offer... otherwise
*NO*! Having said that if management / orchestration software has
problems then clearly you should not use IPvlan for that use case.

> Patches 1-8 include style fixes and refactoring (patch 9 depends upon)
> that improve the overal quality and make the intruduction of the
> feature straightforward.
>
Lots of this fall into I-say-potato-you-say-... category. My way of
thinking / organizing code is different than yours and you don't have
to like mine and I don't have to like yours.

> Patch 9 enables slave devices to own unique MAC addresses and change
> such addresses live, fixing lack of support and a related bug, as
> MAC address changes on master were not propagated to slave devices.
> In order to preserve the main peculiarity of this driver, that is
> exposing only a single MAC address for outbound traffic, frames
> egressing from master are now effectively masquerated when working in
> L2 mode.
>
This enhancement is, however, coming via packet-header rewrite for
every Tx/Rx packet which defeats the purpose. The only good thing that
came in light is the mac-addr change propagation from master issue;
but if the fix is coming as a side-effect of header rewrite then it's
not an acceptable fix either. This can be simply fixed by changing a
line in ipvlan_hard_header().

> Marco Chiappero (9):
>   ipvlan: fix coding style for the ipvlan tree
>   ipvlan: refactor ipvlan_process_multicast for readability
>   ipvlan: replace ipvlan_rcv_frame
>   ipvlan: rework the IP lookup function
>   ipvlan: improve and uniform naming
>   ipvlan: reposition three functions
>   ipvlan: relocate ipvlan_skb_crossing_ns calls
>   ipvlan: improve compiler hints
>   ipvlan: introduce individual MAC addresses
>
>  drivers/net/ipvlan/ipvlan.h      |   2 +-
>  drivers/net/ipvlan/ipvlan_core.c | 592 ++++++++++++++++++++-------------------
>  drivers/net/ipvlan/ipvlan_main.c |  49 ++--
>  drivers/net/ipvlan/ipvtap.c      |   1 +
>  4 files changed, 333 insertions(+), 311 deletions(-)
>
> --
> 2.9.3
>
> --------------------------------------------------------------
> Intel Research and Development Ireland Limited
> Registered in Ireland
> Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
> Registered Number: 308263
>
>
> This e-mail and any attachments may contain confidential material for the sole
> use of the intended recipient(s). Any review or distribution by others is
> strictly prohibited. If you are not the intended recipient, please contact the
> sender and delete all copies.
>

^ permalink raw reply

* Re: [PATCH net-next 09/18] net: dsa: mv88e6xxx: move VTU Data accessors
From: Andrew Lunn @ 2017-04-27 19:19 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli
In-Reply-To: <20170426155336.5937-10-vivien.didelot@savoirfairelinux.com>

On Wed, Apr 26, 2017 at 11:53:27AM -0400, Vivien Didelot wrote:
> The code to access the VTU Data registers currently only supports the
> 88E6185 family and alike: 2-bit membership adjacent to 2-bit port state.
> 
> Even though the 88E6352 family introduced an indirect table to program
> the VLAN Spanning Tree states, the usage of the VTU Data registers
> remains the same regardless the VTU or STU operation.
> 
> Now that the mv88e6xxx_vtu_entry structure contains both port membership
> and states data, factorize the code to access them in global1_vtu.c.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* [PATCH net-next] igb: mark PM functions as __maybe_unused
From: Arnd Bergmann @ 2017-04-27 19:09 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: Arnd Bergmann, Alexander Duyck, David S. Miller, Jacob Keller,
	Todd Fujinaka, Yury Kylulin, intel-wired-lan, netdev,
	linux-kernel

The new wake function is only used by the suspend/resume handlers that
are defined in inside of an #ifdef, which can cause this harmless
warning:

drivers/net/ethernet/intel/igb/igb_main.c:7988:13: warning: 'igb_deliver_wake_packet' defined but not used [-Wunused-function]

Removing the #ifdef, instead using a __maybe_unused annotation
simplifies the code and avoids the warning.

Fixes: b90fa8763560 ("igb: Enable reading of wake up packet")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 18 +++++-------------
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 1cf74aa4ebd9..2d5bdb1fd37d 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -191,10 +191,7 @@ static int igb_disable_sriov(struct pci_dev *dev);
 static int igb_pci_disable_sriov(struct pci_dev *dev);
 #endif
 
-#ifdef CONFIG_PM
-#ifdef CONFIG_PM_SLEEP
 static int igb_suspend(struct device *);
-#endif
 static int igb_resume(struct device *);
 static int igb_runtime_suspend(struct device *dev);
 static int igb_runtime_resume(struct device *dev);
@@ -204,7 +201,6 @@ static const struct dev_pm_ops igb_pm_ops = {
 	SET_RUNTIME_PM_OPS(igb_runtime_suspend, igb_runtime_resume,
 			igb_runtime_idle)
 };
-#endif
 static void igb_shutdown(struct pci_dev *);
 static int igb_pci_sriov_configure(struct pci_dev *dev, int num_vfs);
 #ifdef CONFIG_IGB_DCA
@@ -8015,9 +8011,7 @@ static void igb_deliver_wake_packet(struct net_device *netdev)
 	netif_rx(skb);
 }
 
-#ifdef CONFIG_PM
-#ifdef CONFIG_PM_SLEEP
-static int igb_suspend(struct device *dev)
+static int __maybe_unused igb_suspend(struct device *dev)
 {
 	int retval;
 	bool wake;
@@ -8036,9 +8030,8 @@ static int igb_suspend(struct device *dev)
 
 	return 0;
 }
-#endif /* CONFIG_PM_SLEEP */
 
-static int igb_resume(struct device *dev)
+static int __maybe_unused igb_resume(struct device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct net_device *netdev = pci_get_drvdata(pdev);
@@ -8092,7 +8085,7 @@ static int igb_resume(struct device *dev)
 	return err;
 }
 
-static int igb_runtime_idle(struct device *dev)
+static int __maybe_unused igb_runtime_idle(struct device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct net_device *netdev = pci_get_drvdata(pdev);
@@ -8104,7 +8097,7 @@ static int igb_runtime_idle(struct device *dev)
 	return -EBUSY;
 }
 
-static int igb_runtime_suspend(struct device *dev)
+static int __maybe_unused igb_runtime_suspend(struct device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
 	int retval;
@@ -8124,11 +8117,10 @@ static int igb_runtime_suspend(struct device *dev)
 	return 0;
 }
 
-static int igb_runtime_resume(struct device *dev)
+static int __maybe_unused igb_runtime_resume(struct device *dev)
 {
 	return igb_resume(dev);
 }
-#endif /* CONFIG_PM */
 
 static void igb_shutdown(struct pci_dev *pdev)
 {
-- 
2.9.0

^ permalink raw reply related

* Re: [PATCH net-next 1/4] ixgbe: sparc: rename the ARCH_WANT_RELAX_ORDER to IXGBE_ALLOW_RELAXED_ORDER
From: Casey Leedom @ 2017-04-27 19:00 UTC (permalink / raw)
  To: Bjorn Helgaas, Alexander Duyck
  Cc: Ding Tianhong, Mark Rutland, Amir Ancel, Gabriele Paoloni,
	linux-pci@vger.kernel.org, Catalin Marinas, Will Deacon, LinuxArm,
	David Laight, jeffrey.t.kirsher@intel.com, netdev@vger.kernel.org,
	Robin Murphy, davem@davemloft.net,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <20170427171938.GA10705@bhelgaas-glaptop.roam.corp.google.com>

  Thanks for adding me to the Cc list Bjorn.  Hopefully my message will make
it out to the netdev and linux-pci lists -- I'm not currently subscribed to
them.  I've explicitly marked this message to be sent in plain text but
modern email agents suck with respect to this. (sigh) I have officially
become a curmudgeon. 

  So, officially, Relaxed Ordering should be a Semantic Noop as far as PCIe
transfers are concerned, as long as you don't care what order the PCIe
Transaction Layer Packets are processed in by the target PCIe Fabric End
Point.

  Basically, if you have some number of back-to-back PCIe TLPs between two
Fabric End Points {A} -> {B} which have the Relaxed Ordering Attribute set,
the End Point {B} receiving these RO TLPs may process them in any order it
likes.  When a TLP without Relaxed Ordering is sent {A} -> {B}, all
preceding TLPs with Relaxed Ordering set must be processed by {B} prior to
processing the TLP without Relaxed Ordering set.  In this sense, a TLP
without Relaxed Ordering set is something akin to a "memory barrier".

  All of this is covered in Section 2.4.1 of the PCIe 3.0 Specification (PCI
Express(r) Base Specification Revision 3.0 November 10, 2010).

  The advantage of using Relaxed Ordering (which is heavily used when
sending data to Graphics Cards as I understand it), is that the PCIe
Endpoint can potentially optimize the processing order of RO TLPs with
things like a local multi-channel Memory Controller in order to achieve the
highest transfer bandwidth possible.

  However, we have discovered at least two PCIe 3.0 Root Complex
implementations which have problems with TLPs directed at them with the
Relaxed Ordering Attribute set and I'm in the process of working up a Linux
Kernel PCI "Quirk" to allow those PCIe End Points to be marked as "not being
advisable to send RO TLPs to".  These problems range from "mere" Performance
Problems to outright Data Corruption.  I'm working with the vendors of these
...  "problematic" Root Complex implementations and hope to have this patch
submitted to the linux-pci list by tomorrow.

  By the way, it's important to note that just because, say, a Root Complex
has problems with RO TLPs directed at it, that doesn't mean that you want to
avoid all use of Relaxed Ordering within the PCIe Fabric.  For instance,
with the vendor whose Root Complex has a Performance Problem with RO TLPs
directed at it, it's perfectly reasonable -- and desired -- to use Relaxed
Ordering in Peer-to-Peer traffic.  Say for instance, with an NVMe <->
Ethernet application.

Casey

^ permalink raw reply

* Re: [PATCH net-next 08/18] net: dsa: mv88e6xxx: move generic VTU GetNext
From: Andrew Lunn @ 2017-04-27 18:59 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli
In-Reply-To: <20170426155336.5937-9-vivien.didelot@savoirfairelinux.com>

On Wed, Apr 26, 2017 at 11:53:26AM -0400, Vivien Didelot wrote:
> Even though every switch model has a different way to access the VTU
> Data bits, the base implementation of the VTU GetNext operation remains
> the same: wait, write the first VID to iterate from, start the
> operation, and read the next VID.
> 
> Move this generic implementation into global_vtu.c and abstract the

global1_vtu.c

> +int mv88e6xxx_g1_vtu_getnext(struct mv88e6xxx_chip *chip,
> +			     struct mv88e6xxx_vtu_entry *entry)
> +{
> +	int err;
> +
> +	err = mv88e6xxx_g1_vtu_op_wait(chip);
> +	if (err)
> +		return err;
> +
> +	/* Write the VID to iterate from only once */
> +	if (!entry->valid) {
> +		err = mv88e6xxx_g1_vtu_vid_write(chip, entry);
> +		if (err)
> +			return err;
> +	}

Please could you add a bigger comment. It is not clear why you write
it, when it is invalid. That just seems wrong, and needs a good
comment to explain why it is correct, more than what you currently
have as a comment.

Thanks
	Andrew

^ permalink raw reply

* Re: [PATCH net-next 07/18] net: dsa: mv88e6xxx: move VTU VID accessors
From: Andrew Lunn @ 2017-04-27 18:51 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli
In-Reply-To: <20170426155336.5937-8-vivien.didelot@savoirfairelinux.com>

> @@ -1464,13 +1457,16 @@ static int _mv88e6xxx_vtu_loadpurge(struct mv88e6xxx_chip *chip,
>  				    struct mv88e6xxx_vtu_entry *entry)
>  {
>  	u16 op = GLOBAL_VTU_OP_VTU_LOAD_PURGE;
> -	u16 reg = 0;
>  	int err;
>  
>  	err = mv88e6xxx_g1_vtu_op_wait(chip);
>  	if (err)
>  		return err;
>  
> +	err = mv88e6xxx_g1_vtu_vid_write(chip, entry);
> +	if (err)
> +		return err;
> +
>  	if (!entry->valid)
>  		goto loadpurge;
>  
> @@ -1496,14 +1492,7 @@ static int _mv88e6xxx_vtu_loadpurge(struct mv88e6xxx_chip *chip,
>  		op |= (entry->fid & 0xf0) << 8;
>  		op |= entry->fid & 0xf;
>  	}
> -
> -	reg = GLOBAL_VTU_VID_VALID;
>  loadpurge:
> -	reg |= entry->vid & GLOBAL_VTU_VID_MASK;
> -	err = mv88e6xxx_g1_write(chip, GLOBAL_VTU_VID, reg);
> -	if (err)
> -		return err;
> -
>  	return mv88e6xxx_g1_vtu_op(chip, op);
>  }

This is not obvious, why do the vtu_vid_write() at the beginning,
rather than at the end? Especially before the if (!entry->valid).

However, when you look at the rest of the patch, it is O.K.

It might of been better to do this in two patches, to make it clearer
what is going on.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH net] net: adjust skb->truesize in ___pskb_trim()
From: Willem de Bruijn @ 2017-04-27 18:43 UTC (permalink / raw)
  To: Andrey Konovalov; +Cc: Eric Dumazet, David Miller, netdev, Willem de Bruijn
In-Reply-To: <CAAeHK+xTtA-L7VJsP9GXHpVD=o-WOEJ+xD_sJ4b4O-F_aZx_aw@mail.gmail.com>

On Wed, Apr 26, 2017 at 1:08 PM, Andrey Konovalov <andreyknvl@google.com> wrote:
> On Wed, Apr 26, 2017 at 6:07 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> From: Eric Dumazet <edumazet@google.com>
>>
>> Andrey found a way to trigger the WARN_ON_ONCE(delta < len) in
>> skb_try_coalesce() using syzkaller and a filter attached to a TCP
>> socket.
>>
>> As we did recently in commit 158f323b9868 ("net: adjust skb->truesize in
>> pskb_expand_head()") we can adjust skb->truesize from ___pskb_trim(),
>> via a call to skb_condense().
>>
>> If all frags were freed, then skb->truesize can be recomputed.
>>
>> This call can be done if skb is not yet owned, or destructor is
>> sock_edemux().
>
> Hi Eric,
>
> I still see the warning even with your patch.

Can this happen if sk_trim_filter_cap trims the skb to free some,
but not all, of the frags? If skb->data_len remains larger than
skb->end - skb->tail, skb_condense will not adjust the truesize.

^ permalink raw reply

* Re: [PATCH net-next 06/18] net: dsa: mv88e6xxx: move VTU SID accessors
From: Andrew Lunn @ 2017-04-27 18:37 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli
In-Reply-To: <20170426155336.5937-7-vivien.didelot@savoirfairelinux.com>

On Wed, Apr 26, 2017 at 11:53:24AM -0400, Vivien Didelot wrote:
> Add helpers to access the VTU SID register in the global1_vtu.c file.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH net-next 05/18] net: dsa: mv88e6xxx: move VTU FID accessors
From: Andrew Lunn @ 2017-04-27 18:35 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli
In-Reply-To: <20170426155336.5937-6-vivien.didelot@savoirfairelinux.com>

On Wed, Apr 26, 2017 at 11:53:23AM -0400, Vivien Didelot wrote:
> Add helpers to access the VTU FID register in the global1_vtu.c file.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH net-next 04/18] net: dsa: mv88e6xxx: move VTU flush
From: Andrew Lunn @ 2017-04-27 18:34 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli
In-Reply-To: <20170426155336.5937-5-vivien.didelot@savoirfairelinux.com>

On Wed, Apr 26, 2017 at 11:53:22AM -0400, Vivien Didelot wrote:
> Move the VTU flush operation to global1_vtu.c and call it from a
> mv88e6xxx_vtu_setup helper, similarly to the ATU and PVT setup.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH net-next 03/18] net: dsa: mv88e6xxx: move VTU Operation accessors
From: Andrew Lunn @ 2017-04-27 18:33 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli
In-Reply-To: <20170426155336.5937-4-vivien.didelot@savoirfairelinux.com>

On Wed, Apr 26, 2017 at 11:53:21AM -0400, Vivien Didelot wrote:
> Move the helper functions to access the Global 1 VTU Operation register
> to a new global1_vtu.c file, and get rid of the old underscore prefix
> naming convention. This file will be extended will all VTU/STU related
> code.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH net-next 03/18] net: dsa: mv88e6xxx: move VTU Operation accessors
From: Andrew Lunn @ 2017-04-27 18:32 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli
In-Reply-To: <20170426155336.5937-4-vivien.didelot@savoirfairelinux.com>

On Wed, Apr 26, 2017 at 11:53:21AM -0400, Vivien Didelot wrote:
> Move the helper functions to access the Global 1 VTU Operation register
> to a new global1_vtu.c file, and get rid of the old underscore prefix
> naming convention. This file will be extended will all VTU/STU related
> code.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
> ---
>  drivers/net/dsa/mv88e6xxx/Makefile      |  1 +
>  drivers/net/dsa/mv88e6xxx/chip.c        | 39 +++++++++------------------------
>  drivers/net/dsa/mv88e6xxx/global1.h     |  3 +++
>  drivers/net/dsa/mv88e6xxx/global1_vtu.c | 33 ++++++++++++++++++++++++++++
>  4 files changed, 47 insertions(+), 29 deletions(-)
>  create mode 100644 drivers/net/dsa/mv88e6xxx/global1_vtu.c
> 
> diff --git a/drivers/net/dsa/mv88e6xxx/Makefile b/drivers/net/dsa/mv88e6xxx/Makefile
> index 31d37a90cec7..6edd869c8d6f 100644
> --- a/drivers/net/dsa/mv88e6xxx/Makefile
> +++ b/drivers/net/dsa/mv88e6xxx/Makefile
> @@ -2,5 +2,6 @@ obj-$(CONFIG_NET_DSA_MV88E6XXX) += mv88e6xxx.o
>  mv88e6xxx-objs := chip.o
>  mv88e6xxx-objs += global1.o
>  mv88e6xxx-objs += global1_atu.o
> +mv88e6xxx-objs += global1_vtu.o
>  mv88e6xxx-$(CONFIG_NET_DSA_MV88E6XXX_GLOBAL2) += global2.o
>  mv88e6xxx-objs += port.o
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
> index f025d3c22dba..bf0350432337 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.c
> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> @@ -3,9 +3,6 @@
>   *
>   * Copyright (c) 2008 Marvell Semiconductor
>   *
> - * Copyright (c) 2015 CMC Electronics, Inc.
> - *	Added support for VLAN Table Unit operations
> - *
>   * Copyright (c) 2016 Andrew Lunn <andrew@lunn.ch>
>   *
>   * Copyright (c) 2016-2017 Savoir-faire Linux Inc.
> @@ -1266,31 +1263,15 @@ static void mv88e6xxx_port_fast_age(struct dsa_switch *ds, int port)
>  		netdev_err(ds->ports[port].netdev, "failed to flush ATU\n");
>  }
>  
> -static int _mv88e6xxx_vtu_wait(struct mv88e6xxx_chip *chip)
> -{
> -	return mv88e6xxx_g1_wait(chip, GLOBAL_VTU_OP, GLOBAL_VTU_OP_BUSY);
> -}
> -
> -static int _mv88e6xxx_vtu_cmd(struct mv88e6xxx_chip *chip, u16 op)
> -{
> -	int err;
> -
> -	err = mv88e6xxx_g1_write(chip, GLOBAL_VTU_OP, op);
> -	if (err)
> -		return err;
> -
> -	return _mv88e6xxx_vtu_wait(chip);
> -}
> -
>  static int _mv88e6xxx_vtu_stu_flush(struct mv88e6xxx_chip *chip)
>  {
>  	int ret;
>  
> -	ret = _mv88e6xxx_vtu_wait(chip);
> +	ret = mv88e6xxx_g1_vtu_op_wait(chip);
>  	if (ret < 0)
>  		return ret;
>  
> -	return _mv88e6xxx_vtu_cmd(chip, GLOBAL_VTU_OP_FLUSH_ALL);
> +	return mv88e6xxx_g1_vtu_op(chip, GLOBAL_VTU_OP_FLUSH_ALL);
>  }
>  
>  static int _mv88e6xxx_vtu_stu_data_read(struct mv88e6xxx_chip *chip,
> @@ -1380,11 +1361,11 @@ static int _mv88e6xxx_vtu_getnext(struct mv88e6xxx_chip *chip,
>  	u16 val;
>  	int err;
>  
> -	err = _mv88e6xxx_vtu_wait(chip);
> +	err = mv88e6xxx_g1_vtu_op_wait(chip);
>  	if (err)
>  		return err;
>  
> -	err = _mv88e6xxx_vtu_cmd(chip, GLOBAL_VTU_OP_VTU_GET_NEXT);
> +	err = mv88e6xxx_g1_vtu_op(chip, GLOBAL_VTU_OP_VTU_GET_NEXT);
>  	if (err)
>  		return err;

Hi Vivien

This patch got me for a while. It initially looks like a bad idea. It
is not until you read all the patches, you can see why it is actually
good. It is a simple mechanical translation, which is easy to
review. But it seemed odd to be using these names in chip.c. But later
patches solve that, by moving the code in global1_vtu.c.

	Andrew

^ permalink raw reply

* Re: [PATCH net-next 02/18] net: dsa: mv88e6xxx: split VTU entry data member
From: Andrew Lunn @ 2017-04-27 18:25 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli
In-Reply-To: <20170426155336.5937-3-vivien.didelot@savoirfairelinux.com>

On Wed, Apr 26, 2017 at 11:53:20AM -0400, Vivien Didelot wrote:
> VLAN aware Marvell chips can program 802.1Q VLAN membership as well as
> 802.1s per VLAN Spanning Tree state using the same 3 VTU Data registers.
> 
> Some chips such as 88E6185 use different Data registers offsets for
> ports state and membership, and program them in a single operation.
> 
> Other chips such as 88E6352 use the same register layout but program
> them in distinct operations (an indirect table is used for 802.1s.)
> 
> Newer chips such as 88E6390 use the same offsets for both state and
> membership in distinct operations, thus require multiple data accesses.
> 
> To correctly abstract this, split the "data" structure member of
> mv88e6xxx_vtu_entry in two "state" and "member" members, before adding
> VTU support for newer chips.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* [PATCH] igb: make a few local functions static
From: Colin King @ 2017-04-27 17:59 UTC (permalink / raw)
  To: Jeff Kirsher, intel-wired-lan, netdev; +Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

clean up a few sparse warnings, these following
functions can be made static:

drivers/net/ethernet/intel/igb/igb_main.c: warning: symbol
  'igb_add_mac_filter' was not declared. Should it be static?
drivers/net/ethernet/intel/igb/igb_main.c: warning: symbol
  'igb_del_mac_filter' was not declared. Should it be static?
drivers/net/ethernet/intel/igb/igb_main.c: warning: symbol
  'igb_set_vf_mac_filter' was not declared. Should it be static?

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 1cf74aa4ebd9..b0021819b9d0 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6457,8 +6457,8 @@ static void igb_set_default_mac_filter(struct igb_adapter *adapter)
 	igb_rar_set_index(adapter, 0);
 }
 
-int igb_add_mac_filter(struct igb_adapter *adapter, const u8 *addr,
-		       const u8 queue)
+static int igb_add_mac_filter(struct igb_adapter *adapter, const u8 *addr,
+			      const u8 queue)
 {
 	struct e1000_hw *hw = &adapter->hw;
 	int rar_entries = hw->mac.rar_entry_count -
@@ -6487,8 +6487,8 @@ int igb_add_mac_filter(struct igb_adapter *adapter, const u8 *addr,
 	return -ENOSPC;
 }
 
-int igb_del_mac_filter(struct igb_adapter *adapter, const u8 *addr,
-		       const u8 queue)
+static int igb_del_mac_filter(struct igb_adapter *adapter, const u8 *addr,
+			      const u8 queue)
 {
 	struct e1000_hw *hw = &adapter->hw;
 	int rar_entries = hw->mac.rar_entry_count -
@@ -6540,8 +6540,8 @@ static int igb_uc_unsync(struct net_device *netdev, const unsigned char *addr)
 	return 0;
 }
 
-int igb_set_vf_mac_filter(struct igb_adapter *adapter, const int vf,
-			  const u32 info, const u8 *addr)
+static int igb_set_vf_mac_filter(struct igb_adapter *adapter, const int vf,
+				 const u32 info, const u8 *addr)
 {
 	struct pci_dev *pdev = adapter->pdev;
 	struct vf_data_storage *vf_data = &adapter->vf_data[vf];
-- 
2.11.0


^ permalink raw reply related

* Re: [patch net-next 00/10] net: sched: introduce multichain support for filters
From: Cong Wang @ 2017-04-27 17:46 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Linux Kernel Network Developers, David Miller, Jamal Hadi Salim,
	David Ahern, Eric Dumazet, Stephen Hemminger, Daniel Borkmann,
	Alexander Duyck, mlxsw, Simon Horman
In-Reply-To: <1493291540-2119-1-git-send-email-jiri@resnulli.us>

On Thu, Apr 27, 2017 at 4:12 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> Simple example:
> $ tc qdisc add dev eth0 ingress
> $ tc filter add dev eth0 parent ffff: protocol ip pref 33 flower dst_mac 52:54:00:3d:c7:6d action goto chain 11
> $ tc filter add dev eth0 parent ffff: protocol ip pref 22 chain 11 flower dst_ip 192.168.40.1 action drop
> $ tc filter show dev eth0 root

Interesting.

I don't look into the code yet. If I understand the concepts correctly,
so with your patchset we can mark either filter with a chain No. to
choose which chain it belongs to _logically_ even though
_physically_ it is still in the old-fashion chain (prio, proto)?

If so, you have to ensure proto is same since the protocol of
the packet does not change dynamically? And the original
priority becomes pointless with chains since we can just to
any other chain in any order?

By default, without any chain No., you use 0 for all the chains,
so the old-fashion chain still works.

Thanks.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox