Netdev List
 help / color / mirror / Atom feed
* [PATCH] net/9p: correct some comment errors in 9p file system code
From: Sun Lianwen @ 2018-05-08  1:49 UTC (permalink / raw)
  To: viro, rdunlap; +Cc: netdev

There are follow comment errors:
1 The function name is wrong in p9_release_pages() comment.
2 The function name and variable name is wrong in p9_poll_workfn() comment.
3 There is no variable dm_mr and lkey in struct p9_trans_rdma.
4 The function name is wrong in rdma_create_trans() comment.
5 There is no variable initialized in struct virtio_chan.
6 The variable name is wrong in p9_virtio_zc_request() comment.

Signed-off-by: Sun Lianwen <sunlw.fnst@cn.fujitsu.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
---
 net/9p/trans_common.c | 2 +-
 net/9p/trans_fd.c     | 4 ++--
 net/9p/trans_rdma.c   | 4 +---
 net/9p/trans_virtio.c | 5 ++---
 4 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/net/9p/trans_common.c b/net/9p/trans_common.c
index 38aa6345bdfa..b718db2085b2 100644
--- a/net/9p/trans_common.c
+++ b/net/9p/trans_common.c
@@ -16,7 +16,7 @@
 #include <linux/module.h>
 
 /**
- *  p9_release_req_pages - Release pages after the transaction.
+ *  p9_release_pages - Release pages after the transaction.
  */
 void p9_release_pages(struct page **pages, int nr_pages)
 {
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 0cfba919d167..848969fe7979 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -1092,8 +1092,8 @@ static struct p9_trans_module p9_fd_trans = {
 };
 
 /**
- * p9_poll_proc - poll worker thread
- * @a: thread state and arguments
+ * p9_poll_workfn - poll worker thread
+ * @work: work queue
  *
  * polls all v9fs transports for new events and queues the appropriate
  * work to the work queue
diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c
index 6d8e3031978f..88c71c0e95df 100644
--- a/net/9p/trans_rdma.c
+++ b/net/9p/trans_rdma.c
@@ -68,8 +68,6 @@
  * @pd: Protection Domain pointer
  * @qp: Queue Pair pointer
  * @cq: Completion Queue pointer
- * @dm_mr: DMA Memory Region pointer
- * @lkey: The local access only memory region key
  * @timeout: Number of uSecs to wait for connection management events
  * @privport: Whether a privileged port may be used
  * @port: The port to use
@@ -632,7 +630,7 @@ static int p9_rdma_bind_privport(struct p9_trans_rdma *rdma)
 }
 
 /**
- * trans_create_rdma - Transport method for creating atransport instance
+ * rdma_create_trans - Transport method for creating a transport instance
  * @client: client instance
  * @addr: IP address string
  * @args: Mount options string
diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index 3aa5a93ad107..4d0372263e5d 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -60,7 +60,6 @@ static atomic_t vp_pinned = ATOMIC_INIT(0);
 
 /**
  * struct virtio_chan - per-instance transport information
- * @initialized: whether the channel is initialized
  * @inuse: whether the channel is in use
  * @lock: protects multiple elements within this structure
  * @client: client instance
@@ -385,8 +384,8 @@ static int p9_get_mapped_pages(struct virtio_chan *chan,
  * @uidata: user bffer that should be ued for zero copy read
  * @uodata: user buffer that shoud be user for zero copy write
  * @inlen: read buffer size
- * @olen: write buffer size
- * @hdrlen: reader header size, This is the size of response protocol data
+ * @outlen: write buffer size
+ * @in_hdr_len: reader header size, This is the size of response protocol data
  *
  */
 static int
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH] net/9p: correct some comment errors in 9p file system code
From: Randy Dunlap @ 2018-05-08  1:20 UTC (permalink / raw)
  To: Sun Lianwen, viro; +Cc: netdev
In-Reply-To: <20180508004551.21714-1-sunlw.fnst@cn.fujitsu.com>

On 05/07/2018 05:45 PM, Sun Lianwen wrote:
> There are follow comment errors:
> 1 The function name is wrong in p9_release_pages() comment.
> 2 The function name and variable name is wrong in p9_poll_workfn() comment.
> 3 There is no variable dm_mr and lkey in struct p9_trans_rdma.
> 4 The function name is wrong in rdma_create_trans() comment.
> 5 There is no variable initialized in struct virtio_chan.
> 6 The variable name is wrong in p9_virtio_zc_request() comment.
> 
> Signed-off-by: Sun Lianwen <sunlw.fnst@cn.fujitsu.com>
> ---
>  net/9p/trans_common.c | 2 +-
>  net/9p/trans_fd.c     | 4 ++--
>  net/9p/trans_rdma.c   | 4 +---
>  net/9p/trans_virtio.c | 5 ++---
>  4 files changed, 6 insertions(+), 9 deletions(-)
> 
> diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c
> index 6d8e3031978f..88c71c0e95df 100644
> --- a/net/9p/trans_rdma.c
> +++ b/net/9p/trans_rdma.c
> @@ -632,7 +630,7 @@ static int p9_rdma_bind_privport(struct p9_trans_rdma *rdma)
>  }
>  
>  /**
> - * trans_create_rdma - Transport method for creating atransport instance
> + * rdma_create_trans - Transport method for creating atransport instance

                                                        a transport

>   * @client: client instance
>   * @addr: IP address string
>   * @args: Mount options string


-- 
~Randy

^ permalink raw reply

* Re: [PATCH 2/8] rhashtable: remove nulls_base and related code.
From: NeilBrown @ 2018-05-08  1:14 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Thomas Graf, netdev, linux-kernel
In-Reply-To: <20180507092748.yqqldkwkxkynaaju@gondor.apana.org.au>

[-- Attachment #1: Type: text/plain, Size: 2488 bytes --]

On Mon, May 07 2018, Herbert Xu wrote:

> On Sun, May 06, 2018 at 07:37:49AM +1000, NeilBrown wrote:
>> 
>> I can see no evidence that this is required for anything, as it isn't
>> use and I'm fairly sure that in it's current form - it cannot be used.
>
> Search for nulls in net/ipv4.  This is definitely used throughout
> the network stack.  As the aim is to convert as many existing hash
> tables to rhashtable as possible, we want to keep this.

Yes, I'm sure that nulls lists are very useful and I can see them used
in net/ipv4 (and I would like to use them myself).  I don't want to
remove the nulls list concept from rhashtable, and my patch doesn't do
that.

It just removes nulls_base (as the subject says) and stops setting any
particular value in the nulls pointer, because the value is currently
never tested.

The point of nulls lists is, as I'm sure you know, to detect if an
object was moved to a different chain and to restart the search in that
case.
This means that on an unsuccessful search, we need to test
get_nulls_value() of the tail pointer.
Several times in net/ipv4/inet_hashtables.c (for example) we see code
like:

	/*
	 * if the nulls value we got at the end of this lookup is
	 * not the expected one, we must restart lookup.
	 * We probably met an item that was moved to another chain.
	 */
	if (get_nulls_value(node) != slot)
		goto begin;

which does exactly that.  This test-and-restart cannot be done by a
caller to rhashtable_lookup_fast() as the caller doesn't even get to
the tail nulls.
It would need to be done in rhashtable.[ch].
If you like, I can make this functionality work rather than just
removing the useless parts.

I would change INIT_RHT_NULLS_HEAD() to return to be
#define INIT_RHT_NULLS_HEAD(ptr, ht, hash) \
	((ptr) = (void*)NULLS_MARKER(((unsigned long)ptr)>>1)

Then when __rhashtable_lookup() comes to the end of the chain
without finding anything it would do something like
  if (get_nulls_value(he)<<1 == (unsigned long)head)
    goto restart;

where 'head' is the head of the list that we would have saved at the
start.

i.e. instead of using a nulls_base and limiting the size of the hash, we
store the address of the bucket in the nulls.

In my mind that should be a separate patch.  First remove the unused,
harmful code.  Then add code to provide useful functionality.  But
I can put the two together in one patch if you would prefer that.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply

* Re: [PATCH 6/8] rhashtable: further improve stability of rhashtable_walk
From: NeilBrown @ 2018-05-08  0:54 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Thomas Graf, netdev, linux-kernel
In-Reply-To: <20180507093019.qkz75qm7rocnbfnl@gondor.apana.org.au>

[-- Attachment #1: Type: text/plain, Size: 3398 bytes --]

On Mon, May 07 2018, Herbert Xu wrote:

> On Sun, May 06, 2018 at 07:50:54AM +1000, NeilBrown wrote:
>>
>> Do we?  How could we fix it for both rhashtable and rhltable?
>
> Well I suggested earlier to insert the walker object into the
> hash table, which would be applicable regardless of whether it
> is a normal rhashtable of a rhltable.

Oh yes, that idea.  I had considered it and discarded it.
Moving a walker object around in an rcu-list is far from straight
forward.  A lockless iteration of the list would need to be very careful
not to be tripped up by the walker-object.  I don't think we want to
make the search code more complex.

I explored the idea a bit more and I think that any solution that we
came up with along those lines would look quite different for regular
rhash_tables than for rhl_tables.  So it is probably best to keep them
quite separate.
i.e. use the scheme I proposed for rhashtables and use something else
entirely for rhltables.

My current thinking for a stable walk for rhl tables goes like this:

- we embed
     struct rhl_cursor {
         struct rhl_cursor *next;
         int unseen;
     }
  in struct rhashtable_iter.

- on rhashtable_stop(), we:
  + lock the current hash chain
  + find the next object that is still in the table (after ->p).
  + count the number of objects from there to the end to the
     end of the rhlist_head.next list.
  + store that count in rhl_cursor.unseen
  + change the ->next pointer at the end of the list to point to the
    rhl_cursor, but with the lsb set.  If there was already an
    rhl_cursor there, they are chained together on ->next.

- on rhashtable_start(), we lock the hash chain and search the hash
  chain and sub-chains for  ->p or the rhl_cursor.
  If ->p is still there, that can be used as a reliable anchor.
  If ->p isn't found but the rhl_cursor is, then the 'unseen' counter
  tells us where to start in that rhl_list.
  If neither are found, then we start at the beginning of the hash chain.
  If the rhl_cursor is found, it is removed.

- lookup and insert can almost completely ignore the cursor.
  rhl_for_each_rcu() needs to detect the end-of-list by looking for
  lsb set, but that is the only change.
  As _insert() adds new objects to the head of the rhlist, that
  won't disturb the accuracy of the 'unseen' count.  The newly inserted
  object won't be seen by the walk, but that is expected.

- delete needs some extra handling.
  + When an object is deleted from an rhlist and the list does not become
    empty, then it counts the number of objects to the end of the list,
    and possibly decrements the 'unseen' counter on any rhl_cursors that
    are at the end of the list.
  + when an object is deleted resulting in an empty rhlist, any
    cursors at the end of the list needs to be moved to the next
    list in the hash chain, and their 'unseen' count needs to be set to
    the length of the list.
    If there is no next object in the hash chain, then the iter.slot is
    incremented and the rhlist_curs is left unconnected.

- rehash needs to remove any cursors from the end of an rhlist before
  moving them to the new table.  The cursors are left disconnected.

I'm happy to implement this if you think the approach is workable and
the functionality is necessary.  However there are no current users of
rhltable_walk_inter that would benefit from this.


Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply

* Re: [PATCH] net: 8390: Fix possible data races in __ei_get_stats
From: Jia-Ju Bai @ 2018-05-08  0:51 UTC (permalink / raw)
  To: Eric Dumazet, davem, fthain, joe; +Cc: netdev, linux-kernel
In-Reply-To: <e48345b4-fe9f-d267-777b-6a51baba0307@gmail.com>



On 2018/5/7 22:15, Eric Dumazet wrote:
>
> On 05/07/2018 07:08 AM, Jia-Ju Bai wrote:
>> The write operations to "dev->stats" are protected by
>> the spinlock on line 862-864, but the read operations to
>> this data on line 858 and 867 are not protected by the spinlock.
>> Thus, there may exist data races for "dev->stats".
>>
>> To fix the data races, the read operations to "dev->stats" are
>> protected by the spinlock, and a local variable is used for return.
>>
>> Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
>> ---
>>   drivers/net/ethernet/8390/lib8390.c | 14 ++++++++++----
>>   1 file changed, 10 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/8390/lib8390.c b/drivers/net/ethernet/8390/lib8390.c
>> index c9c55c9eab9f..198952247d30 100644
>> --- a/drivers/net/ethernet/8390/lib8390.c
>> +++ b/drivers/net/ethernet/8390/lib8390.c
>> @@ -852,19 +852,25 @@ static struct net_device_stats *__ei_get_stats(struct net_device *dev)
>>   	unsigned long ioaddr = dev->base_addr;
>>   	struct ei_device *ei_local = netdev_priv(dev);
>>   	unsigned long flags;
>> +	struct net_device_stats *stats;
>> +
>> +	spin_lock_irqsave(&ei_local->page_lock, flags);
>>   
>>   	/* If the card is stopped, just return the present stats. */
>> -	if (!netif_running(dev))
>> -		return &dev->stats;
>> +	if (!netif_running(dev)) {
>> +		stats = &dev->stats;
>> +		spin_unlock_irqrestore(&ei_local->page_lock, flags);
>> +		return stats;
>> +	}
>>   
>> -	spin_lock_irqsave(&ei_local->page_lock, flags);
>>   	/* Read the counter registers, assuming we are in page 0. */
>>   	dev->stats.rx_frame_errors  += ei_inb_p(ioaddr + EN0_COUNTER0);
>>   	dev->stats.rx_crc_errors    += ei_inb_p(ioaddr + EN0_COUNTER1);
>>   	dev->stats.rx_missed_errors += ei_inb_p(ioaddr + EN0_COUNTER2);
>> +	stats = &dev->stats;
>>   	spin_unlock_irqrestore(&ei_local->page_lock, flags);
>>   
>> -	return &dev->stats;
>> +	return stats;
>>   }
>>   
>>   /*
>>
> dev->stats is not a pointer, it is an array embedded in the
> struct net_device
>
> So this patch is not needed, since dev->stats can not change.

Thanks for your reply :)

I do not understand that why "dev->stats can not change".
Its data is indeed changed by the code:
      dev->stats.rx_frame_errors  += ei_inb_p(ioaddr + EN0_COUNTER0);
      dev->stats.rx_crc_errors    += ei_inb_p(ioaddr + EN0_COUNTER1);
      dev->stats.rx_missed_errors += ei_inb_p(ioaddr + EN0_COUNTER2);

So I think a data race may occur when returning "dev->stats" without 
lock protection.

By the way, I find this possible data race is similar to the previous 
commit 7b31b4deda76 for the tg3 driver.


Best wishes,
Jia-Ju Bai

^ permalink raw reply

* [PATCH] net/9p: correct some comment errors in 9p file system code
From: Sun Lianwen @ 2018-05-08  0:45 UTC (permalink / raw)
  To: viro; +Cc: netdev

There are follow comment errors:
1 The function name is wrong in p9_release_pages() comment.
2 The function name and variable name is wrong in p9_poll_workfn() comment.
3 There is no variable dm_mr and lkey in struct p9_trans_rdma.
4 The function name is wrong in rdma_create_trans() comment.
5 There is no variable initialized in struct virtio_chan.
6 The variable name is wrong in p9_virtio_zc_request() comment.

Signed-off-by: Sun Lianwen <sunlw.fnst@cn.fujitsu.com>
---
 net/9p/trans_common.c | 2 +-
 net/9p/trans_fd.c     | 4 ++--
 net/9p/trans_rdma.c   | 4 +---
 net/9p/trans_virtio.c | 5 ++---
 4 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/net/9p/trans_common.c b/net/9p/trans_common.c
index 38aa6345bdfa..b718db2085b2 100644
--- a/net/9p/trans_common.c
+++ b/net/9p/trans_common.c
@@ -16,7 +16,7 @@
 #include <linux/module.h>
 
 /**
- *  p9_release_req_pages - Release pages after the transaction.
+ *  p9_release_pages - Release pages after the transaction.
  */
 void p9_release_pages(struct page **pages, int nr_pages)
 {
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 0cfba919d167..848969fe7979 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -1092,8 +1092,8 @@ static struct p9_trans_module p9_fd_trans = {
 };
 
 /**
- * p9_poll_proc - poll worker thread
- * @a: thread state and arguments
+ * p9_poll_workfn - poll worker thread
+ * @work: work queue
  *
  * polls all v9fs transports for new events and queues the appropriate
  * work to the work queue
diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c
index 6d8e3031978f..88c71c0e95df 100644
--- a/net/9p/trans_rdma.c
+++ b/net/9p/trans_rdma.c
@@ -68,8 +68,6 @@
  * @pd: Protection Domain pointer
  * @qp: Queue Pair pointer
  * @cq: Completion Queue pointer
- * @dm_mr: DMA Memory Region pointer
- * @lkey: The local access only memory region key
  * @timeout: Number of uSecs to wait for connection management events
  * @privport: Whether a privileged port may be used
  * @port: The port to use
@@ -632,7 +630,7 @@ static int p9_rdma_bind_privport(struct p9_trans_rdma *rdma)
 }
 
 /**
- * trans_create_rdma - Transport method for creating atransport instance
+ * rdma_create_trans - Transport method for creating atransport instance
  * @client: client instance
  * @addr: IP address string
  * @args: Mount options string
diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index 3aa5a93ad107..4d0372263e5d 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -60,7 +60,6 @@ static atomic_t vp_pinned = ATOMIC_INIT(0);
 
 /**
  * struct virtio_chan - per-instance transport information
- * @initialized: whether the channel is initialized
  * @inuse: whether the channel is in use
  * @lock: protects multiple elements within this structure
  * @client: client instance
@@ -385,8 +384,8 @@ static int p9_get_mapped_pages(struct virtio_chan *chan,
  * @uidata: user bffer that should be ued for zero copy read
  * @uodata: user buffer that shoud be user for zero copy write
  * @inlen: read buffer size
- * @olen: write buffer size
- * @hdrlen: reader header size, This is the size of response protocol data
+ * @outlen: write buffer size
+ * @in_hdr_len: reader header size, This is the size of response protocol data
  *
  */
 static int
-- 
2.17.0

^ permalink raw reply related

* linux-next: manual merge of the bpf-next tree with the s390 tree
From: Stephen Rothwell @ 2018-05-08  0:26 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov, Networking,
	Martin Schwidefsky, Heiko Carstens
  Cc: Linux-Next Mailing List, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 748 bytes --]

Hi all,

Today's linux-next merge of the bpf-next tree got a conflict in:

  arch/s390/net/bpf_jit.S

between commit:

  de5cb6eb514e ("s390: use expoline thunks in the BPF JIT")

from the s390 tree and commit:

  e1cf4befa297 ("bpf, s390x: remove ld_abs/ld_ind")

from the bpf-next tree.

I fixed it up (I just removed the file as the latter does) and can
carry the fix as necessary. This is now fixed as far as linux-next is
concerned, but any non trivial conflicts should be mentioned to your
upstream maintainer when your tree is submitted for merging.  You may
also want to consider cooperating with the maintainer of the conflicting
tree to minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH net-next v10 2/4] net: Introduce generic failover module
From: Samudrala, Sridhar @ 2018-05-08  0:24 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: mst, davem, netdev, virtualization, virtio-dev, jesse.brandeburg,
	alexander.h.duyck, kubakici, jasowang, loseweigh, jiri,
	aaron.f.brown
In-Reply-To: <20180507165306.541bd1f2@xeon-e3>



On 5/7/2018 4:53 PM, Stephen Hemminger wrote:
> On Mon,  7 May 2018 15:10:44 -0700
> Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
>
>> +static struct net_device *net_failover_get_bymac(u8 *mac,
>> +						 struct net_failover_ops **ops)
>> +{
>> +	struct net_device *failover_dev;
>> +	struct net_failover *failover;
>> +
>> +	spin_lock(&net_failover_lock);
>> +	list_for_each_entry(failover, &net_failover_list, list) {
>> +		failover_dev = rtnl_dereference(failover->failover_dev);
>> +		if (ether_addr_equal(failover_dev->perm_addr, mac)) {
>> +			*ops = rtnl_dereference(failover->ops);
>> +			spin_unlock(&net_failover_lock);
>> +			return failover_dev;
>> +		}
>> +	}
>> +	spin_unlock(&net_failover_lock);
>> +	return NULL;
>> +}
> This is broken if non-ethernet devices such as Infiniband are present.

There is check to make sure that a slave and failover devices are of the same type in
net_failover_slave_register()

	failover_dev = net_failover_get_bymac(slave_dev->perm_addr, &nfo_ops);
         if (!failover_dev)
                 goto done;

         if (failover_dev->type != slave_dev->type)
                 goto done;

Do you think this is not good enough? I had an explicit check for ARPHRD_ETHER in
earlier patchsets, but removed it based on Jiri's comment.

^ permalink raw reply

* Re: [PATCH 8/8] rhashtable: don't hold lock on first table throughout insertion.
From: NeilBrown @ 2018-05-08  0:23 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Thomas Graf, netdev, linux-kernel
In-Reply-To: <20180507092939.vhps3uf2vdckf7ky@gondor.apana.org.au>

[-- Attachment #1: Type: text/plain, Size: 1342 bytes --]

On Mon, May 07 2018, Herbert Xu wrote:

> On Mon, May 07, 2018 at 08:24:41AM +1000, NeilBrown wrote:
>>
>> This is true, but I don't see how it is relevant.
>> At some point, each thread will find that the table they have just
>> locked for their search key, has a NULL 'future_tbl' pointer.
>> At the point, the thread can know that the key is not in any table,
>> and that no other thread can add the key until the lock is released.
>
> The updating of future_tbl is not synchronised with insert threads.
> Therefore it is entirely possible that two inserters end up on
> different tables as their "latest" table.  This must not be allowed
> to occur.

I disagree.
Certainly the update of future_tbl is not synchronised with insert
threads.
However there is only a single update to any given future_tbl (from NULL
to non-NULL) and two insert threads for the same key will see that
update in a synchronized way as they look at it while holding the bucket
lock for that key.
It is certainly true if that two inserters can end up on different
tables as their "latest" table, but that doesn't result in a problem.
e.g. T1 might see A as the latest table, and T2 might see B where
  A.future_tbl == B
In that case, T2 must have examined A (under the lock) *after* T1
examined A, and so will have seen if T1 inserted anything.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply

* Re: [PATCH net-next v10 2/4] net: Introduce generic failover module
From: Samudrala, Sridhar @ 2018-05-08  0:11 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: mst, davem, netdev, virtualization, virtio-dev, jesse.brandeburg,
	alexander.h.duyck, kubakici, jasowang, loseweigh, jiri,
	aaron.f.brown
In-Reply-To: <20180507165911.45b14c58@xeon-e3>

On 5/7/2018 4:59 PM, Stephen Hemminger wrote:
> On Mon,  7 May 2018 15:10:44 -0700
> Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
>
>> +	if (netif_running(failover_dev)) {
>> +		err = dev_open(slave_dev);
>> +		if (err && (err != -EBUSY)) {
>> +			netdev_err(failover_dev, "Opening slave %s failed err:%d\n",
>> +				   slave_dev->name, err);
>> +			goto err_dev_open;
>> +		}
>> +	}
>> +
>> +	netif_addr_lock_bh(failover_dev);
>> +	dev_uc_sync_multiple(slave_dev, failover_dev);
>> +	dev_uc_sync_multiple(slave_dev, failover_dev);
>> +	netif_addr_unlock_bh(failover_dev);
>> +
> The order of these is backwards, you want to sync addresses before bringing up.
> Also, doing it this way does not allow udev/systemd the chance to rename VF devices.

During my testing, i noticed that dev_open() may fail with EBUSY in certain scenarios,
If so, the opening of the slave is handled after the rename via the NETDEV_CHANGENAME
event handler.

>
> The complexity of this whole failover mechanism does not make life easier,
> more reliable, or safer for netvsc. I though that was the whole reason for having
> common code.

netvsc doesn't go through this code.

	if (nfo_ops && nfo_ops->slave_register)
                 return nfo_ops->slave_register(slave_dev, failover_dev);

So there is no change in event handling for netvsc 2-netdev model.

^ permalink raw reply

* Re: [PATCH net-next v10 2/4] net: Introduce generic failover module
From: Stephen Hemminger @ 2018-05-07 23:59 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: mst, davem, netdev, virtualization, virtio-dev, jesse.brandeburg,
	alexander.h.duyck, kubakici, jasowang, loseweigh, jiri,
	aaron.f.brown
In-Reply-To: <1525731046-10989-3-git-send-email-sridhar.samudrala@intel.com>

On Mon,  7 May 2018 15:10:44 -0700
Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:

> +	if (netif_running(failover_dev)) {
> +		err = dev_open(slave_dev);
> +		if (err && (err != -EBUSY)) {
> +			netdev_err(failover_dev, "Opening slave %s failed err:%d\n",
> +				   slave_dev->name, err);
> +			goto err_dev_open;
> +		}
> +	}
> +
> +	netif_addr_lock_bh(failover_dev);
> +	dev_uc_sync_multiple(slave_dev, failover_dev);
> +	dev_uc_sync_multiple(slave_dev, failover_dev);
> +	netif_addr_unlock_bh(failover_dev);
> +

The order of these is backwards, you want to sync addresses before bringing up.
Also, doing it this way does not allow udev/systemd the chance to rename VF devices.

The complexity of this whole failover mechanism does not make life easier,
more reliable, or safer for netvsc. I though that was the whole reason for having
common code.

^ permalink raw reply

* [for-next 5/6] net/mlx5: Cleanup unused field in Work Queue parameters
From: Saeed Mahameed @ 2018-05-07 23:53 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford
  Cc: netdev, linux-rdma, Leon Romanovsky, Jason Gunthorpe,
	Tariq Toukan, Saeed Mahameed
In-Reply-To: <20180507235304.25085-1-saeedm@mellanox.com>

From: Tariq Toukan <tariqt@mellanox.com>

Remove the 'linear' field from struct mlx5_wq_param.
It is redundant, set but never read.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 1 -
 drivers/net/ethernet/mellanox/mlx5/core/wq.h      | 1 -
 2 files changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index b29c1d93f058..f60905648797 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1896,7 +1896,6 @@ static void mlx5e_build_rq_param(struct mlx5e_priv *priv,
 	MLX5_SET(rqc, rqc, scatter_fcs,    params->scatter_fcs_en);
 
 	param->wq.buf_numa_node = dev_to_node(&mdev->pdev->dev);
-	param->wq.linear = 1;
 }
 
 static void mlx5e_build_drop_rq_param(struct mlx5e_priv *priv,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/wq.h b/drivers/net/ethernet/mellanox/mlx5/core/wq.h
index fca90b94596d..f3dfa0ca3c5d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/wq.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/wq.h
@@ -38,7 +38,6 @@
 #include <linux/mlx5/qp.h>
 
 struct mlx5_wq_param {
-	int		linear;
 	int		buf_numa_node;
 	int		db_numa_node;
 };
-- 
2.14.3

^ permalink raw reply related

* [for-next 6/6] net/mlx5: fix spelling mistake: "modfiy" -> "modify"
From: Saeed Mahameed @ 2018-05-07 23:53 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford
  Cc: netdev, linux-rdma, Leon Romanovsky, Jason Gunthorpe,
	Colin Ian King, Saeed Mahameed
In-Reply-To: <20180507235304.25085-1-saeedm@mellanox.com>

From: Colin Ian King <colin.king@canonical.com>

Trivial fix to spelling mistake in netdev_warn warning message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
index 610d485c4b03..f64b5e78519b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
@@ -565,7 +565,7 @@ static void arfs_modify_rule_rq(struct mlx5e_priv *priv,
 	err =  mlx5_modify_rule_destination(rule, &dst, NULL);
 	if (err)
 		netdev_warn(priv->netdev,
-			    "Failed to modfiy aRFS rule destination to rq=%d\n", rxq);
+			    "Failed to modify aRFS rule destination to rq=%d\n", rxq);
 }
 
 static void arfs_handle_work(struct work_struct *work)
-- 
2.14.3

^ permalink raw reply related

* [for-next 3/6] net/mlx5: Refactor num of blocks in mailbox calculation
From: Saeed Mahameed @ 2018-05-07 23:53 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford
  Cc: netdev, linux-rdma, Leon Romanovsky, Jason Gunthorpe,
	Moshe Shemesh, Saeed Mahameed
In-Reply-To: <20180507235304.25085-1-saeedm@mellanox.com>

From: Moshe Shemesh <moshe@mellanox.com>

Get the logic that calculates the number of blocks in a command mailbox
into a dedicated function.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index 21cd1703a862..0f7062104ad9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -135,6 +135,14 @@ static struct mlx5_cmd_layout *get_inst(struct mlx5_cmd *cmd, int idx)
 	return cmd->cmd_buf + (idx << cmd->log_stride);
 }
 
+static int mlx5_calc_cmd_blocks(struct mlx5_cmd_msg *msg)
+{
+	int size = msg->len;
+	int blen = size - min_t(int, sizeof(msg->first.data), size);
+
+	return DIV_ROUND_UP(blen, MLX5_CMD_DATA_BLOCK_SIZE);
+}
+
 static u8 xor8_buf(void *buf, size_t offset, int len)
 {
 	u8 *ptr = buf;
@@ -174,10 +182,7 @@ static void calc_block_sig(struct mlx5_cmd_prot_block *block)
 static void calc_chain_sig(struct mlx5_cmd_msg *msg)
 {
 	struct mlx5_cmd_mailbox *next = msg->next;
-	int size = msg->len;
-	int blen = size - min_t(int, sizeof(msg->first.data), size);
-	int n = (blen + MLX5_CMD_DATA_BLOCK_SIZE - 1)
-		/ MLX5_CMD_DATA_BLOCK_SIZE;
+	int n = mlx5_calc_cmd_blocks(msg);
 	int i = 0;
 
 	for (i = 0; i < n && next; i++)  {
@@ -220,12 +225,9 @@ static void free_cmd(struct mlx5_cmd_work_ent *ent)
 static int verify_signature(struct mlx5_cmd_work_ent *ent)
 {
 	struct mlx5_cmd_mailbox *next = ent->out->next;
+	int n = mlx5_calc_cmd_blocks(ent->out);
 	int err;
 	u8 sig;
-	int size = ent->out->len;
-	int blen = size - min_t(int, sizeof(ent->out->first.data), size);
-	int n = (blen + MLX5_CMD_DATA_BLOCK_SIZE - 1)
-		/ MLX5_CMD_DATA_BLOCK_SIZE;
 	int i = 0;
 
 	sig = xor8_buf(ent->lay, 0, sizeof(*ent->lay));
@@ -1137,7 +1139,6 @@ static struct mlx5_cmd_msg *mlx5_alloc_cmd_msg(struct mlx5_core_dev *dev,
 	struct mlx5_cmd_mailbox *tmp, *head = NULL;
 	struct mlx5_cmd_prot_block *block;
 	struct mlx5_cmd_msg *msg;
-	int blen;
 	int err;
 	int n;
 	int i;
@@ -1146,8 +1147,8 @@ static struct mlx5_cmd_msg *mlx5_alloc_cmd_msg(struct mlx5_core_dev *dev,
 	if (!msg)
 		return ERR_PTR(-ENOMEM);
 
-	blen = size - min_t(int, sizeof(msg->first.data), size);
-	n = (blen + MLX5_CMD_DATA_BLOCK_SIZE - 1) / MLX5_CMD_DATA_BLOCK_SIZE;
+	msg->len = size;
+	n = mlx5_calc_cmd_blocks(msg);
 
 	for (i = 0; i < n; i++) {
 		tmp = alloc_cmd_box(dev, flags);
@@ -1165,7 +1166,6 @@ static struct mlx5_cmd_msg *mlx5_alloc_cmd_msg(struct mlx5_core_dev *dev,
 		head = tmp;
 	}
 	msg->next = head;
-	msg->len = size;
 	return msg;
 
 err_alloc:
-- 
2.14.3

^ permalink raw reply related

* [for-next 4/6] net/mlx5: Fix dump_command mailbox length printed
From: Saeed Mahameed @ 2018-05-07 23:53 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford
  Cc: netdev, linux-rdma, Leon Romanovsky, Jason Gunthorpe,
	Moshe Shemesh, Saeed Mahameed
In-Reply-To: <20180507235304.25085-1-saeedm@mellanox.com>

From: Moshe Shemesh <moshe@mellanox.com>

Dump command mailbox length printed was correct only if data_only flag
was set. For the case that data_only flag was clear the offset to stop
printing at was wrong and so the buffer printed was too short.
Changed the print loop to stop according to number of buffers in
mailbox.

Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index 0f7062104ad9..487388aed98f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -722,9 +722,11 @@ static void dump_command(struct mlx5_core_dev *dev,
 	struct mlx5_cmd_msg *msg = input ? ent->in : ent->out;
 	u16 op = MLX5_GET(mbox_in, ent->lay->in, opcode);
 	struct mlx5_cmd_mailbox *next = msg->next;
+	int n = mlx5_calc_cmd_blocks(msg);
 	int data_only;
 	u32 offset = 0;
 	int dump_len;
+	int i;
 
 	data_only = !!(mlx5_core_debug_mask & (1 << MLX5_CMD_DATA));
 
@@ -751,7 +753,7 @@ static void dump_command(struct mlx5_core_dev *dev,
 		offset += sizeof(*ent->lay);
 	}
 
-	while (next && offset < msg->len) {
+	for (i = 0; i < n && next; i++)  {
 		if (data_only) {
 			dump_len = min_t(int, MLX5_CMD_DATA_BLOCK_SIZE, msg->len - offset);
 			dump_buf(next->buf, dump_len, 1, offset);
-- 
2.14.3

^ permalink raw reply related

* [for-next 2/6] net/mlx5: Decrease level of prints about non-existent MKEY
From: Saeed Mahameed @ 2018-05-07 23:53 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford
  Cc: netdev, linux-rdma, Leon Romanovsky, Jason Gunthorpe,
	Saeed Mahameed
In-Reply-To: <20180507235304.25085-1-saeedm@mellanox.com>

From: Leon Romanovsky <leonro@mellanox.com>

User-controlled application can cause multiple prints as below to flood
dmesg. Since knowledge of failed MKey release is important for debug,
let's decrease its level to debug.

mlx5_core 0000:00:04.0: mlx5_core_destroy_mkey:127:(pid 2352): failed
radix tree delete of mkey 0x1ed700

Reported-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/mr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mr.c b/drivers/net/ethernet/mellanox/mlx5/core/mr.c
index b9736f505bdf..f4f02f775c93 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mr.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mr.c
@@ -123,8 +123,8 @@ int mlx5_core_destroy_mkey(struct mlx5_core_dev *dev,
 	deleted_mkey = radix_tree_delete(&table->tree, mlx5_base_mkey(mkey->key));
 	write_unlock_irqrestore(&table->lock, flags);
 	if (!deleted_mkey) {
-		mlx5_core_warn(dev, "failed radix tree delete of mkey 0x%x\n",
-			       mlx5_base_mkey(mkey->key));
+		mlx5_core_dbg(dev, "failed radix tree delete of mkey 0x%x\n",
+			      mlx5_base_mkey(mkey->key));
 		return -ENOENT;
 	}
 
-- 
2.14.3

^ permalink raw reply related

* [for-next 1/6] net/mlx5: remove some extraneous spaces in indentations
From: Saeed Mahameed @ 2018-05-07 23:52 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford
  Cc: netdev, linux-rdma, Leon Romanovsky, Jason Gunthorpe,
	Colin Ian King, Saeed Mahameed
In-Reply-To: <20180507235304.25085-1-saeedm@mellanox.com>

From: Colin Ian King <colin.king@canonical.com>

There are several lines where there is an extraneous space causing
indentation misalignment. Remove them.

Cleans up Cocconelle warnings:

./drivers/net/ethernet/mellanox/mlx5/core/qp.c:409:3-18: code aligned
with following code on line 410
./drivers/net/ethernet/mellanox/mlx5/core/qp.c:415:3-18: code aligned
with following code on line 416
./drivers/net/ethernet/mellanox/mlx5/core/qp.c:421:3-18: code aligned
with following code on line 422

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/qp.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/qp.c b/drivers/net/ethernet/mellanox/mlx5/core/qp.c
index 02d6c5b5d502..4ca07bfb6b14 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/qp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/qp.c
@@ -407,21 +407,21 @@ static int modify_qp_mbox_alloc(struct mlx5_core_dev *dev, u16 opcode, int qpn,
 	case MLX5_CMD_OP_RST2INIT_QP:
 		if (MBOX_ALLOC(mbox, rst2init_qp))
 			return -ENOMEM;
-		 MOD_QP_IN_SET_QPC(rst2init_qp, mbox->in, opcode, qpn,
-				   opt_param_mask, qpc);
-		 break;
+		MOD_QP_IN_SET_QPC(rst2init_qp, mbox->in, opcode, qpn,
+				  opt_param_mask, qpc);
+		break;
 	case MLX5_CMD_OP_INIT2RTR_QP:
 		if (MBOX_ALLOC(mbox, init2rtr_qp))
 			return -ENOMEM;
-		 MOD_QP_IN_SET_QPC(init2rtr_qp, mbox->in, opcode, qpn,
-				   opt_param_mask, qpc);
-		 break;
+		MOD_QP_IN_SET_QPC(init2rtr_qp, mbox->in, opcode, qpn,
+				  opt_param_mask, qpc);
+		break;
 	case MLX5_CMD_OP_RTR2RTS_QP:
 		if (MBOX_ALLOC(mbox, rtr2rts_qp))
 			return -ENOMEM;
-		 MOD_QP_IN_SET_QPC(rtr2rts_qp, mbox->in, opcode, qpn,
-				   opt_param_mask, qpc);
-		 break;
+		MOD_QP_IN_SET_QPC(rtr2rts_qp, mbox->in, opcode, qpn,
+				  opt_param_mask, qpc);
+		break;
 	case MLX5_CMD_OP_RTS2RTS_QP:
 		if (MBOX_ALLOC(mbox, rts2rts_qp))
 			return -ENOMEM;
-- 
2.14.3

^ permalink raw reply related

* [pull request][for-next 0/6] Mellanox, mlx5 updates 2018-05-07
From: Saeed Mahameed @ 2018-05-07 23:52 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford
  Cc: netdev, linux-rdma, Leon Romanovsky, Jason Gunthorpe,
	Saeed Mahameed

Hi Dave & Doug,

This pull request includes misc updates and cleanups for mlx5 core
driver for both net and rdma next branches, for more information please
see tag log below.

Please pull and let me know if there's any problem.

Thanks,
Saeed.

---

The following changes since commit 60cc43fc888428bb2f18f08997432d426a243338:

  Linux 4.17-rc1 (2018-04-15 18:24:20 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git tags/mlx5-updates-2018-05-07

for you to fetch changes up to a8408f4e6db775e245f20edf12b13fd58cc03a1c:

  net/mlx5: fix spelling mistake: "modfiy" -> "modify" (2018-05-04 12:11:51 -0700)

----------------------------------------------------------------
mlx5-updates-2018-05-07

mlx5 core driver misc cleanups and updates:
 - fix spelling mistake: "modfiy" -> "modify"
 - Cleanup unused field in Work Queue parameters
 - dump_command mailbox length printed
 - Refactor num of blocks in mailbox calculation
 - Decrease level of prints about non-existent MKEY
 - remove some extraneous spaces in indentations

----------------------------------------------------------------
Colin Ian King (2):
      net/mlx5: remove some extraneous spaces in indentations
      net/mlx5: fix spelling mistake: "modfiy" -> "modify"

Leon Romanovsky (1):
      net/mlx5: Decrease level of prints about non-existent MKEY

Moshe Shemesh (2):
      net/mlx5: Refactor num of blocks in mailbox calculation
      net/mlx5: Fix dump_command mailbox length printed

Tariq Toukan (1):
      net/mlx5: Cleanup unused field in Work Queue parameters

 drivers/net/ethernet/mellanox/mlx5/core/cmd.c     | 28 ++++++++++++-----------
 drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |  1 -
 drivers/net/ethernet/mellanox/mlx5/core/mr.c      |  4 ++--
 drivers/net/ethernet/mellanox/mlx5/core/qp.c      | 18 +++++++--------
 drivers/net/ethernet/mellanox/mlx5/core/wq.h      |  1 -
 6 files changed, 27 insertions(+), 27 deletions(-)

^ permalink raw reply

* Re: [PATCH net-next v10 2/4] net: Introduce generic failover module
From: Stephen Hemminger @ 2018-05-07 23:53 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: mst, davem, netdev, virtualization, virtio-dev, jesse.brandeburg,
	alexander.h.duyck, kubakici, jasowang, loseweigh, jiri,
	aaron.f.brown
In-Reply-To: <1525731046-10989-3-git-send-email-sridhar.samudrala@intel.com>

On Mon,  7 May 2018 15:10:44 -0700
Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:

> +static struct net_device *net_failover_get_bymac(u8 *mac,
> +						 struct net_failover_ops **ops)
> +{
> +	struct net_device *failover_dev;
> +	struct net_failover *failover;
> +
> +	spin_lock(&net_failover_lock);
> +	list_for_each_entry(failover, &net_failover_list, list) {
> +		failover_dev = rtnl_dereference(failover->failover_dev);
> +		if (ether_addr_equal(failover_dev->perm_addr, mac)) {
> +			*ops = rtnl_dereference(failover->ops);
> +			spin_unlock(&net_failover_lock);
> +			return failover_dev;
> +		}
> +	}
> +	spin_unlock(&net_failover_lock);
> +	return NULL;
> +}

This is broken if non-ethernet devices such as Infiniband are present.

^ permalink raw reply

* Re: [PATCH net-next v10 2/4] net: Introduce generic failover module
From: Stephen Hemminger @ 2018-05-07 23:46 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: mst, davem, netdev, virtualization, virtio-dev, jesse.brandeburg,
	alexander.h.duyck, kubakici, jasowang, loseweigh, jiri,
	aaron.f.brown
In-Reply-To: <1525731046-10989-3-git-send-email-sridhar.samudrala@intel.com>

On Mon,  7 May 2018 15:10:44 -0700
Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:

> This provides a generic interface for paravirtual drivers to listen
> for netdev register/unregister/link change events from pci ethernet
> devices with the same MAC and takeover their datapath. The notifier and
> event handling code is based on the existing netvsc implementation.
> 
> It exposes 2 sets of interfaces to the paravirtual drivers.
> 1. For paravirtual drivers like virtio_net that use 3 netdev model, the
>    the failover module provides interfaces to create/destroy additional
>    master netdev and all the slave events are managed internally.
>           net_failover_create()
>           net_failover_destroy()
>    A failover netdev is created that acts a master device and controls 2
>    slave devices. The original virtio_net netdev is registered as 'standby'
>    netdev and a passthru/vf device with the same MAC gets registered as
>    'primary' netdev. Both 'standby' and 'failover' netdevs are associated
>    with the same 'pci' device.  The user accesses the network interface via
>    'failover' netdev. The 'failover' netdev chooses 'primary' netdev as
>    default for transmits when it is available with link up and running.
> 2. For existing netvsc driver that uses 2 netdev model, no master netdev
>    is created. The paravirtual driver registers each instance of netvsc
>    as a 'failover' netdev  along with a set of ops to manage the slave
>    events. There is no 'standby' netdev in this model. A passthru/vf device
>    with the same MAC gets registered as 'primary' netdev.
>           net_failover_register()
>           net_failover_unregister()
> 
> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>

You are conflating the net_failover device (3 device model) with
the generic network failover infrastructure into one file. There should be two
seperate files net/core/failover.c and drivers/net/failover.c which splits
the work into two parts (and acts a check for the api).

^ permalink raw reply

* [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Sridhar Samudrala @ 2018-05-07 23:09 UTC (permalink / raw)
  To: mst, netdev, virtualization, virtio-dev, jesse.brandeburg,
	alexander.h.duyck, kubakici, sridhar.samudrala, jasowang,
	loseweigh, jiri, aaron.f.brown, qemu-devel

This feature bit can be used by hypervisor to indicate virtio_net device to
act as a standby for another device with the same MAC address.

I tested this with a small change to the patch to mark the STANDBY feature 'true'
by default as i am using libvirt to start the VMs.
Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt
XML file?

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
---
 hw/net/virtio-net.c                         | 2 ++
 include/standard-headers/linux/virtio_net.h | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 90502fca7c..38b3140670 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
                      true),
     DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
     DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
+    DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY,
+                      false),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h
index e9f255ea3f..01ec09684c 100644
--- a/include/standard-headers/linux/virtio_net.h
+++ b/include/standard-headers/linux/virtio_net.h
@@ -57,6 +57,9 @@
 					 * Steering */
 #define VIRTIO_NET_F_CTRL_MAC_ADDR 23	/* Set MAC address */
 
+#define VIRTIO_NET_F_STANDBY      62    /* Act as standby for another device
+                                         * with the same MAC.
+                                         */
 #define VIRTIO_NET_F_SPEED_DUPLEX 63	/* Device set linkspeed and duplex */
 
 #ifndef VIRTIO_NET_NO_LEGACY
-- 
2.14.3

^ permalink raw reply related

* Re: [PATCH net-next v10 2/4] net: Introduce generic failover module
From: Randy Dunlap @ 2018-05-07 22:39 UTC (permalink / raw)
  To: Sridhar Samudrala, mst, stephen, davem, netdev, virtualization,
	virtio-dev, jesse.brandeburg, alexander.h.duyck, kubakici,
	jasowang, loseweigh, jiri, aaron.f.brown
In-Reply-To: <1525731046-10989-3-git-send-email-sridhar.samudrala@intel.com>

Hi,

On 05/07/2018 03:10 PM, Sridhar Samudrala wrote:
> 
> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> ---
>  MAINTAINERS                |    7 +
>  include/linux/netdevice.h  |   16 +
>  include/net/net_failover.h |   52 +++
>  net/Kconfig                |   10 +
>  net/core/Makefile          |    1 +
>  net/core/net_failover.c    | 1044 ++++++++++++++++++++++++++++++++++++++++++++
>  6 files changed, 1130 insertions(+)
>  create mode 100644 include/net/net_failover.h
>  create mode 100644 net/core/net_failover.c


> diff --git a/net/Kconfig b/net/Kconfig
> index b62089fb1332..0540856676de 100644
> --- a/net/Kconfig
> +++ b/net/Kconfig
> @@ -429,6 +429,16 @@ config MAY_USE_DEVLINK
>  config PAGE_POOL
>         bool
>  
> +config NET_FAILOVER
> +	tristate "Failover interface"
> +	default m

Need some justification for default m (as opposed to n).

> +	help
> +	  This provides a generic interface for paravirtual drivers to listen
> +	  for netdev register/unregister/link change events from pci ethernet

	                                                         PCI

> +	  devices with the same MAC and takeover their datapath. This also
> +	  enables live migration of a VM with direct attached VF by failing
> +	  over to the paravirtual datapath when the VF is unplugged.
> +
>  endif   # if NET
>  
>  # Used by archs to tell that they support BPF JIT compiler plus which flavour.

> diff --git a/net/core/net_failover.c b/net/core/net_failover.c
> new file mode 100644
> index 000000000000..8d60e74e3034
> --- /dev/null
> +++ b/net/core/net_failover.c
> @@ -0,0 +1,1044 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2018, Intel Corporation. */
> +
> +/* A common module to handle registrations and notifications for paravirtual
> + * drivers to enable accelerated datapath and support VF live migration.
> + *
> + * The notifier and event handling code is based on netvsc driver and failover
> + * netdev management routines are based on bond/team driver.
> + *
> + */


> +/**
> + * net_failover_create - Create and register a failover instance
> + *
> + * @dev: standby netdev

    * @standby_dev: standby netdev

> + *
> + * Creates a failover netdev and registers a failover instance for a standby
> + * netdev. Used by paravirtual drivers that use 3-netdev model.
> + * The failover netdev acts as a master device and controls 2 slave devices -
> + * the original standby netdev and a VF netdev with the same MAC gets
> + * registered as primary netdev.
> + *
> + * Return: pointer to failover instance
> + */
> +struct net_failover *net_failover_create(struct net_device *standby_dev)
> +{
> +	struct device *dev = standby_dev->dev.parent;
> +	struct net_device *failover_dev;
> +	struct net_failover *failover;
> +	int err;
> +
> +	/* Alloc at least 2 queues, for now we are going with 16 assuming
> +	 * that VF devices being enslaved won't have too many queues.
> +	 */
> +	failover_dev = alloc_etherdev_mq(sizeof(struct net_failover_info), 16);
> +	if (!failover_dev) {
> +		dev_err(dev, "Unable to allocate failover_netdev!\n");
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	dev_net_set(failover_dev, dev_net(standby_dev));
> +	SET_NETDEV_DEV(failover_dev, dev);
> +
> +	failover_dev->netdev_ops = &failover_dev_ops;
> +	failover_dev->ethtool_ops = &failover_ethtool_ops;
> +
> +	/* Initialize the device options */
> +	failover_dev->priv_flags |= IFF_UNICAST_FLT | IFF_NO_QUEUE;
> +	failover_dev->priv_flags &= ~(IFF_XMIT_DST_RELEASE |
> +				       IFF_TX_SKB_SHARING);
> +
> +	/* don't acquire failover netdev's netif_tx_lock when transmitting */
> +	failover_dev->features |= NETIF_F_LLTX;
> +
> +	/* Don't allow failover devices to change network namespaces. */
> +	failover_dev->features |= NETIF_F_NETNS_LOCAL;
> +
> +	failover_dev->hw_features = FAILOVER_VLAN_FEATURES |
> +				    NETIF_F_HW_VLAN_CTAG_TX |
> +				    NETIF_F_HW_VLAN_CTAG_RX |
> +				    NETIF_F_HW_VLAN_CTAG_FILTER;
> +
> +	failover_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL;
> +	failover_dev->features |= failover_dev->hw_features;
> +
> +	memcpy(failover_dev->dev_addr, standby_dev->dev_addr,
> +	       failover_dev->addr_len);
> +
> +	failover_dev->min_mtu = standby_dev->min_mtu;
> +	failover_dev->max_mtu = standby_dev->max_mtu;
> +
> +	err = register_netdev(failover_dev);
> +	if (err) {
> +		dev_err(dev, "Unable to register failover_dev!\n");
> +		goto err_register_netdev;
> +	}
> +
> +	netif_carrier_off(failover_dev);
> +
> +	failover = net_failover_register(failover_dev, NULL);
> +	if (IS_ERR(failover))
> +		goto err_failover_register;
> +
> +	return failover;
> +
> +err_failover_register:
> +	unregister_netdev(failover_dev);
> +err_register_netdev:
> +	free_netdev(failover_dev);
> +
> +	return ERR_PTR(err);
> +}
> +EXPORT_SYMBOL_GPL(net_failover_create);



-- 
~Randy

^ permalink raw reply

* Sensors Expo
From: gemma lavery @ 2018-05-07 22:39 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 1164 bytes --]

Hi,            

 

I have noticed that your company is exhibiting in Sensors Expo on JUN 26 -
28 2018.

 

Would you be interested in the complete (B2B) contact information with email
addresses of Engineers and Engineering Professionals? 

 


Let us know your target contacts that you would like to reach so that I will
get back to you with few samples and other details in my next email.


 

List contains - Company Name, Web Address/URL, Contact name (First name,
middle name and last name), Specialty, Postal address (street address, city,
state, zip code, and country), Phone, and Direct email address.

 

Match Test for Appending: Just send us now 10 to 15 contacts in an Excel
sheet from your in-house database with missing email address, telephone
numbers, fax numbers or mailing addresses, we can append it for you at no
cost, this will help you check the quality of our services.

 


Regards,                                

Gemma

Marketing Associate

 

If you do not wish to receive future emails from us, please reply as "Leave
out".

 



---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

[-- Attachment #2: winmail.dat --]
[-- Type: application/ms-tnef, Size: 8494 bytes --]

^ permalink raw reply

* Re: [PATCH bpf-next v3 3/6] bpf: Add IPv6 Segment Routing helpers
From: Mathieu Xhonneux @ 2018-05-07 22:21 UTC (permalink / raw)
  To: Alexei Starovoitov, David Lebrun; +Cc: netdev, David Miller
In-Reply-To: <201805070716.KC2OLOyD%fengguang.wu@intel.com>

I'm not sure what would be the best approach here. These errors appear
when CONFIG_IPV6=m and CONFIG_IPV6_SEG6_LWTUNNEL=y (which is bool and
depends on IPv6, hence it is also modularized in this case), then
IS_ENABLED(CONFIG_IPV6_SEG6_LWTUNNEL) returns true, even though the
seg6_* symbols are not available when linking vmlinux. If I'm correct,
since net/core/filter.c is always built-in, all functions it uses must
also be built-in.

I didn't find any other dependency from net/core/filter.c using a
feature which can be modularized, hence the only solution I see here
is to create a new bool CONFIG variable, e.g. CONFIG_IPV6_SEG6_BPF,
which would require CONFIG_IPV6=y and CONFIG_IPV6_SEG6_LWTUNNEL=y. I
could then replace my #if IS_ENABLED(CONFIG_IPV6_SEG6_LWTUNNEL)
conditions by #if IS_ENABLED(CONFIG_IPV6_SEG6_BPF) in
net/core/filter.c.

Any comment on this ?

Thanks.

2018-05-07 0:29 GMT+01:00 kbuild test robot <lkp@intel.com>:
> Hi Mathieu,
>
> Thank you for the patch! Yet something to improve:
>
> [auto build test ERROR on bpf-next/master]
>
> url:    https://github.com/0day-ci/linux/commits/Mathieu-Xhonneux/ipv6-sr-introduce-seg6local-End-BPF-action/20180506-233046
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
> config: s390-allmodconfig (attached as .config)
> compiler: s390x-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
> reproduce:
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # save the attached .config to linux build tree
>         make.cross ARCH=s390
>
> All errors (new ones prefixed by >>):
>
>    net/core/filter.o: In function `bpf_push_seg6_encap':
>    filter.c:(.text+0xaf4c): undefined reference to `seg6_validate_srh'
>    filter.c:(.text+0xaf8a): undefined reference to `seg6_do_srh_inline'
>    filter.c:(.text+0xafc4): undefined reference to `seg6_do_srh_encap'
>    filter.c:(.text+0xb016): undefined reference to `seg6_lookup_nexthop'
>    net/core/filter.o: In function `bpf_lwt_seg6_store_bytes':
>>> (.text+0xb106): undefined reference to `seg6_bpf_srh_states'
>    net/core/filter.o: In function `bpf_lwt_seg6_action':
>    (.text+0xb2b0): undefined reference to `seg6_bpf_srh_states'
>>> (.text+0xb334): undefined reference to `seg6_validate_srh'
>>> (.text+0xb394): undefined reference to `seg6_lookup_nexthop'
>    (.text+0xb3c4): undefined reference to `seg6_lookup_nexthop'
>    net/core/filter.o: In function `bpf_lwt_seg6_adjust_srh':
>    (.text+0xb492): undefined reference to `seg6_bpf_srh_states'
>
> ---
> 0-DAY kernel test infrastructure                Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply

* [PATCH v1 iproute2-next 3/3] rdma: update man pages
From: Steve Wise @ 2018-05-07 15:53 UTC (permalink / raw)
  To: dsahern, leon; +Cc: stephen, netdev, linux-rdma
In-Reply-To: <cover.1525709213.git.swise@opengridcomputing.com>

Update the man pages for the resource attributes as well
as the driver-specific attributes.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
 man/man8/rdma-resource.8 | 29 ++++++++++++++++++++++++++---
 man/man8/rdma.8          |  2 +-
 2 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/man/man8/rdma-resource.8 b/man/man8/rdma-resource.8
index ff5d25d..40b073d 100644
--- a/man/man8/rdma-resource.8
+++ b/man/man8/rdma-resource.8
@@ -7,13 +7,16 @@ rdma-resource \- rdma resource configuration
 .in +8
 .ti -8
 .B rdma
-.RI "[ " OPTIONS " ]"
-.B resource
-.RI  " { " COMMAND " | "
+.RI "[ " OPTIONS " ] " RESOURCE " { " COMMAND " | "
 .BR help " }"
 .sp
 
 .ti -8
+.IR RESOURCE " := { "
+.BR cm_id " | " cq " | " mr " | " pd " | " qp " }"
+.sp
+
+.ti -8
 .IR OPTIONS " := { "
 \fB\-j\fR[\fIson\fR] |
 \fB\-d\fR[\fIetails\fR] }
@@ -70,11 +73,31 @@ rdma res show qp link mlx5_4/- -d
 Detailed view.
 .RE
 .PP
+rdma res show qp link mlx5_4/- -dd
+.RS 4
+Detailed view including driver-specific details.
+.RE
+.PP
 rdma res show qp link mlx5_4/1 lqpn 0-6
 .RS 4
 Limit to specific Local QPNs.
 .RE
 .PP
+rdma resource show cm_id dst-port 7174
+.RS 4
+Show CM_IDs with destination ip port of 7174.
+.RE
+.PP
+rdma resource show cm_id src-addr 172.16.0.100
+.RS 4
+Show CM_IDs bound to local ip address 172.16.0.100
+.RE
+.PP
+rdma resource show cq pid 30489
+.RS 4
+Show CQs belonging to pid 30489
+.RE
+.PP
 
 .SH SEE ALSO
 .BR rdma (8),
diff --git a/man/man8/rdma.8 b/man/man8/rdma.8
index 6f88f37..12aa149 100644
--- a/man/man8/rdma.8
+++ b/man/man8/rdma.8
@@ -49,7 +49,7 @@ If there were any errors during execution of the commands, the application retur
 
 .TP
 .BR "\-d" , " --details"
-Output detailed information.
+Output detailed information.  Adding a second \-d includes driver-specific details.
 
 .TP
 .BR "\-p" , " --pretty"
-- 
1.8.3.1

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox