Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next RFC 3/5] vhost: introduce vhost_add_used_idx()
From: Michael S. Tsirkin @ 2017-09-26 19:13 UTC (permalink / raw)
  To: Jason Wang; +Cc: virtualization, netdev, linux-kernel, kvm
In-Reply-To: <1506067355-5771-4-git-send-email-jasowang@redhat.com>

On Fri, Sep 22, 2017 at 04:02:33PM +0800, Jason Wang wrote:
> This patch introduces a helper which just increase the used idx. This
> will be used in pair with vhost_prefetch_desc_indices() by batching
> code.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/vhost/vhost.c | 33 +++++++++++++++++++++++++++++++++
>  drivers/vhost/vhost.h |  1 +
>  2 files changed, 34 insertions(+)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 8424166d..6532cda 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -2178,6 +2178,39 @@ int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len)
>  }
>  EXPORT_SYMBOL_GPL(vhost_add_used);
>  
> +int vhost_add_used_idx(struct vhost_virtqueue *vq, int n)
> +{
> +	u16 old, new;
> +
> +	old = vq->last_used_idx;
> +	new = (vq->last_used_idx += n);
> +	/* If the driver never bothers to signal in a very long while,
> +	 * used index might wrap around. If that happens, invalidate
> +	 * signalled_used index we stored. TODO: make sure driver
> +	 * signals at least once in 2^16 and remove this.
> +	 */
> +	if (unlikely((u16)(new - vq->signalled_used) < (u16)(new - old)))
> +		vq->signalled_used_valid = false;
> +
> +	/* Make sure buffer is written before we update index. */
> +	smp_wmb();
> +	if (vhost_put_user(vq, cpu_to_vhost16(vq, vq->last_used_idx),
> +			   &vq->used->idx)) {
> +		vq_err(vq, "Failed to increment used idx");
> +		return -EFAULT;
> +	}
> +	if (unlikely(vq->log_used)) {
> +		/* Log used index update. */
> +		log_write(vq->log_base,
> +			  vq->log_addr + offsetof(struct vring_used, idx),
> +			  sizeof(vq->used->idx));
> +		if (vq->log_ctx)
> +			eventfd_signal(vq->log_ctx, 1);
> +	}
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(vhost_add_used_idx);
> +
>  static int __vhost_add_used_n(struct vhost_virtqueue *vq,
>  			    struct vring_used_elem *heads,
>  			    unsigned count)
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index 16c2cb6..5dd6c05 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -199,6 +199,7 @@ int __vhost_get_vq_desc(struct vhost_virtqueue *vq,
>  void vhost_discard_vq_desc(struct vhost_virtqueue *, int n);
>  
>  int vhost_vq_init_access(struct vhost_virtqueue *);
> +int vhost_add_used_idx(struct vhost_virtqueue *vq, int n);
>  int vhost_add_used(struct vhost_virtqueue *, unsigned int head, int len);
>  int vhost_add_used_n(struct vhost_virtqueue *, struct vring_used_elem *heads,
>  		     unsigned count);

Please change the API to hide the fact that there's an index that needs
to be updated.

> -- 
> 2.7.4

^ permalink raw reply

* Re: [PATCH net v2] net: dsa: mv88e6xxx: lock mutex when freeing IRQs
From: Florian Fainelli @ 2017-09-26 19:05 UTC (permalink / raw)
  To: Vivien Didelot, netdev; +Cc: linux-kernel, kernel, David S. Miller, Andrew Lunn
In-Reply-To: <20170926185721.12187-1-vivien.didelot@savoirfairelinux.com>

On 09/26/2017 11:57 AM, Vivien Didelot wrote:
> mv88e6xxx_g2_irq_free locks the registers mutex, but not
> mv88e6xxx_g1_irq_free, which results in a stack trace from
> assert_reg_lock when unloading the mv88e6xxx module. Fix this.
> 
> Fixes: 3460a5770ce9 ("net: dsa: mv88e6xxx: Mask g1 interrupts and free interrupt")
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply

* [PATCH net v2] net: dsa: mv88e6xxx: lock mutex when freeing IRQs
From: Vivien Didelot @ 2017-09-26 18:57 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot

mv88e6xxx_g2_irq_free locks the registers mutex, but not
mv88e6xxx_g1_irq_free, which results in a stack trace from
assert_reg_lock when unloading the mv88e6xxx module. Fix this.

Fixes: 3460a5770ce9 ("net: dsa: mv88e6xxx: Mask g1 interrupts and free interrupt")
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
---
 drivers/net/dsa/mv88e6xxx/chip.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index c6678aa9b4ef..e7ff7483d2fb 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -3947,7 +3947,9 @@ static void mv88e6xxx_remove(struct mdio_device *mdiodev)
 	if (chip->irq > 0) {
 		if (chip->info->g2_irqs > 0)
 			mv88e6xxx_g2_irq_free(chip);
+		mutex_lock(&chip->reg_lock);
 		mv88e6xxx_g1_irq_free(chip);
+		mutex_unlock(&chip->reg_lock);
 	}
 }
 
-- 
2.14.1

^ permalink raw reply related

* [PATCH net-next] liquidio: fix format truncation warning reported by gcc 7.1.1
From: Felix Manlunas @ 2017-09-26 18:48 UTC (permalink / raw)
  To: davem; +Cc: netdev, raghu.vatsavayi, derek.chickles, satananda.burla

From: Satanand Burla <satananda.burla@cavium.com>

gcc 7.1.1 with -Wformat-truncation reports these warnings:

drivers/net/ethernet/cavium/liquidio/lio_core.c: In function `octeon_setup_interrupt':
drivers/net/ethernet/cavium/liquidio/lio_core.c:1003:41: warning: `%u' directive output may be truncated writing between 1 and 10 bytes into a region of size between 0 and 13 [-Wformat-truncation=]
       INTRNAMSIZ, "LiquidIO%u-pf%u-rxtx-%u",
                                         ^~
drivers/net/ethernet/cavium/liquidio/lio_core.c:1003:19: note: directive argument in the range [0, 2147483647]
       INTRNAMSIZ, "LiquidIO%u-pf%u-rxtx-%u",
                   ^~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/cavium/liquidio/lio_core.c:1002:5: note: `snprintf' output between 21 and 43 bytes into a destination of size 32
     snprintf(&queue_irq_names[IRQ_NAME_OFF(i)],
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
       INTRNAMSIZ, "LiquidIO%u-pf%u-rxtx-%u",
       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
       oct->octeon_id, oct->pf_num, i);
       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/cavium/liquidio/lio_core.c:1008:41: warning: `%u' directive output may be truncated writing between 1 and 10 bytes into a region of size between 0 and 13 [-Wformat-truncation=]
       INTRNAMSIZ, "LiquidIO%u-vf%u-rxtx-%u",
                                         ^~
drivers/net/ethernet/cavium/liquidio/lio_core.c:1008:19: note: directive argument in the range [0, 2147483647]
       INTRNAMSIZ, "LiquidIO%u-vf%u-rxtx-%u",
                   ^~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/cavium/liquidio/lio_core.c:1007:5: note: `snprintf' output between 21 and 43 bytes into a destination of size 32
     snprintf(&queue_irq_names[IRQ_NAME_OFF(i)],
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
       INTRNAMSIZ, "LiquidIO%u-vf%u-rxtx-%u",
       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
       oct->octeon_id, oct->vf_num, i);
       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix them by changing the type of the "i" local variable from int to short.

Signed-off-by: Satanand Burla <satananda.burla@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
---
 drivers/net/ethernet/cavium/liquidio/lio_core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_core.c b/drivers/net/ethernet/cavium/liquidio/lio_core.c
index 23f6b60..55c5d44 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_core.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_core.c
@@ -899,11 +899,12 @@ int octeon_setup_interrupt(struct octeon_device *oct, u32 num_ioqs)
 {
 	struct msix_entry *msix_entries;
 	char *queue_irq_names = NULL;
-	int i, num_interrupts = 0;
+	int num_interrupts = 0;
 	int num_alloc_ioq_vectors;
 	char *aux_irq_name = NULL;
 	int num_ioq_vectors;
 	int irqret, err;
+	short i;
 
 	oct->num_msix_irqs = num_ioqs;
 	if (oct->msix_on) {

^ permalink raw reply related

* Re: [PATCH] net: stmmac: Meet alignment requirements for DMA
From: Paul Burton @ 2017-09-26 18:48 UTC (permalink / raw)
  To: David Miller
  Cc: matt.redfearn, netdev, alexandre.torgue, peppe.cavallaro,
	linux-kernel, linux-mips, james.hogan
In-Reply-To: <20170926.103400.1009342814128022820.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 4599 bytes --]

Hi David,

On Tuesday, 26 September 2017 10:34:00 PDT David Miller wrote:
> From: Matt Redfearn <matt.redfearn@imgtec.com>
> Date: Tue, 26 Sep 2017 14:57:39 +0100
> 
> > Since the MIPS architecture requires software cache management,
> > performing a dma_map_single(TO_DEVICE) will writeback and invalidate
> > the cachelines for the required region. To comply with (our
> > interpretation of) the DMA API documentation, the MIPS implementation
> > expects a cacheline aligned & sized buffer, so that it can writeback a
> > set of complete cachelines. Indeed a recent patch
> > (https://patchwork.linux-mips.org/patch/14624/) causes the MIPS dma
> > mapping code to emit warnings if the buffer it is requested to sync is
> > not cachline aligned. This driver, as is, fails this test and causes
> > thousands of warnings to be emitted.
> 
> For a device write, if the device does what it is told and does not
> access bytes outside of the periphery of the DMA mapping, nothing bad
> can happen.
> 
> When the DMA buffer is mapped, the cpu side cachelines are flushed (even
> ones which are partially part of the requsted mapping, this is _fine_).

This is not fine. Let's presume we writeback+invalidate the cache lines that 
are only partially covered by the DMA mapping at its beginning or end, and 
just invalidate all the lines that are wholly covered by the mapping. So far 
so good.

> The device writes into only the bytes it was given access to, which
> starts at the DMA address.

OK - still fine but *only* if we don't write to anything that happens to be 
part of the cache lines that are only partially covered by the DMA mapping & 
make them dirty. If we do then any writeback of those lines will clobber data 
the device wrote, and any invalidation of them will discard data the CPU 
wrote.

How would you have us ensure that such writes don't occur?

This really does not feel to me like something to leave up to drivers without 
any means of checking or enforcing correctness. The requirement to align DMA 
mappings at cache-line boundaries, as outlined in Documentation/DMA-API.txt, 
allows this to be enforced. If you want to ignore this requirement then there 
is no clear way to verify that we aren't writing to cache lines involved in a 
DMA transfer whilst the transfer is occurring, and no sane way to handle those 
cache lines if we do.

> The unmap and/or sync operation after the DMA transfer needs to do
> nothing on the cpu side since the map operation flushed the cpu side
> caches already.
> 
> When the cpu reads, it will pull the cacheline from main memory and
> see what the device wrote.

This is not true, because:

1) CPUs may speculatively read data. In between the invalidations that we did
   earlier & the point at which the transfer completes, the CPU may have
   speculatively read any arbitrary part of the memory mapped for DMA.

2) If we wrote to any lines that are only partially covered by the DMA mapping
   then of course they're valid & dirty, and an access won't fetch from main
   memory.

We therefore need to perform cache invalidation again at the end of the 
transfer - on MIPS you can grep for cpu_needs_post_dma_flush to find this. 
>From a glance ARM has similar post-DMA invalidation in 
__dma_page_dev_to_cpu(), ARM64 in __dma_unmap_area() etc etc.

At this point what would you have us do with cache lines that are only 
partially covered by the DMA mapping? As above - if we writeback+invalidate we 
risk clobbering data written by the device. If we just invalidate we risk 
discarding data written by the CPU. And this is even ignoring the risk that a 
writeback of one of these lines happens due to eviction during the DMA 
transfer, which we have no control over at all.

> It's not like the device can put "garbage" into the bytes earlier in
> the cacheline it was given partial access to.

Right - but the CPU can "put garbage" into the bytes the device was given 
access to, simply by fetching stale bytes from main memory before the device 
overwrites them.

> I really don't see what the problem is and why MIPS needs special
> handling.  You will have to give specific examples, step by step,
> where things can go wrong before I will be able to consider your
> changes.

MIPS doesn't need special handling - it just needs the driver to obey a 
documented restriction when using the DMA API. One could argue that it ought 
to be you who justifies why you feel networking drivers can ignore the 
documentation, and that if you disagree with the documented API you should aim 
to change it rather than ignore it.

Thanks,
    Paul

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH] net/tls: move version check after second userspace fetch
From: David Miller @ 2017-09-26 18:37 UTC (permalink / raw)
  To: mengxu.gatech
  Cc: ilyal, aviadye, davejwatson, netdev, linux-kernel, meng.xu,
	sanidhya, taesoo
In-Reply-To: <1506266095-23003-1-git-send-email-mengxu.gatech@gmail.com>

From: Meng Xu <mengxu.gatech@gmail.com>
Date: Sun, 24 Sep 2017 11:14:55 -0400

> @@ -382,6 +376,12 @@ static int do_tls_setsockopt_tx(struct sock *sk, char __user *optval,
>  			rc = -EFAULT;
>  			goto err_crypto_info;
>  		}
> +
> +		/* check version */
> +		if (crypto_info->version != TLS_1_2_VERSION) {
> +			rc = -ENOTSUPP;
> +			goto err_crypto_info;
> +		}
>  		break;
>  	}
>  	default:
> -- 
> 2.7.4
> 

Please, again, deduce this into a single userspace copy.  Otherwise we're going to
add this check for every cipher we add new support for.

Copy the on-stack crypto info ("tmp_crypto_info") into 'crypto_info' and then
bring in from userspace any further bytes that need to be copied.

That guarantees that the version will not change, and the existing check can
stay where it is.

^ permalink raw reply

* Re: [PATCH] net: bcm63xx_enet: Use setup_timer and mod_timer
From: David Miller @ 2017-09-26 18:33 UTC (permalink / raw)
  To: himanshujha199640
  Cc: f.fainelli, bcm-kernel-feedback-list, arnd, yamada.masahiro,
	netdev, linux-arm-kernel, linux-kernel
In-Reply-To: <1506255084-10943-1-git-send-email-himanshujha199640@gmail.com>

From: Himanshu Jha <himanshujha199640@gmail.com>
Date: Sun, 24 Sep 2017 17:41:24 +0530

> Use setup_timer and mod_timer API instead of structure assignments.
> 
> This is done using Coccinelle and semantic patch used
> for this as follows:
> 
> @@
> expression x,y,z,a,b;
> @@
> 
> -init_timer (&x);
> +setup_timer (&x, y, z);
> +mod_timer (&a, b);
> -x.function = y;
> -x.data = z;
> -x.expires = b;
> -add_timer(&a);
> 
> Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com>

Applied to net-next, thanks.

^ permalink raw reply

* Re: [PATCH v2 net-next 0/4] qed: iWARP fixes and enhancements
From: David Miller @ 2017-09-26 18:22 UTC (permalink / raw)
  To: Michal.Kalderon; +Cc: netdev, leon, linux-rdma, dledford, Ariel.Elior
In-Reply-To: <1506244185-2129-1-git-send-email-Michal.Kalderon@cavium.com>

From: Michal Kalderon <Michal.Kalderon@cavium.com>
Date: Sun, 24 Sep 2017 12:09:41 +0300

> This patch series includes several fixes and enhancements
> related to iWARP.
> 
> Patch #1 is actually the last of the initial iWARP submission.
> It has been delayed until now as I wanted to make sure that qedr
> supports iWARP prior to enabling iWARP device detection.
> 
> iWARP changes in RDMA tree have been accepted and targeted at
> kernel 4.15, therefore, all iWARP fixes for this cycle are
> submitted to net-next.
> 
> Changes from v1->v2 
>   - Added "Fixes:" tag to commit message of patch #3
> 
> Signed-off by: Michal.Kalderon@cavium.com
> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>

Series applied, thanks Michal.

^ permalink raw reply

* Re: [PATCH v4 2/3] ipv4: Namespaceify tcp_fastopen_key knob
From: David Miller @ 2017-09-26 18:18 UTC (permalink / raw)
  To: yanhaishuang; +Cc: kuznet, edumazet, weiwan, lucab, netdev, linux-kernel
In-Reply-To: <F9EA5C4F-2306-4BC5-923A-3AB1D8F50FF1@cmss.chinamobile.com>

From: 严海双 <yanhaishuang@cmss.chinamobile.com>
Date: Tue, 26 Sep 2017 09:25:51 +0800

>> On 2017年9月26日, at 上午7:24, David Miller <davem@davemloft.net> wrote:
>> 
>> From: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
>> Date: Fri, 22 Sep 2017 21:48:43 +0800
>> 
>>> @@ -9,13 +9,18 @@
>>> #include <net/inetpeer.h>
>>> #include <net/tcp.h>
>>> 
>>> -struct tcp_fastopen_context __rcu *tcp_fastopen_ctx;
>>> -
>>> -static DEFINE_SPINLOCK(tcp_fastopen_ctx_lock);
>>> -
>>> -void tcp_fastopen_init_key_once(bool publish)
>>> +void tcp_fastopen_init_key_once(struct net *net)
>> 
>> Why did you remove the 'publish' logic from this function?
>> 
> 
> I think this logic is not necessary now, in proc_tcp_fastopen_key, I have removed 
> tcp_fastopen_init_key_once(false) where the ‘publish’ is false:
> 
> -		/* Generate a dummy secret but don't publish it. This
> -		 * is needed so we don't regenerate a new key on the
> -		 * first invocation of tcp_fastopen_cookie_gen
> -		 */
> -		tcp_fastopen_init_key_once(false);
> -		tcp_fastopen_reset_cipher(user_key, TCP_FASTOPEN_KEY_LENGTH);
> +		tcp_fastopen_reset_cipher(net, user_key, TCP_FASTOPEN_KEY_LENGTH);
> 
> It said we don't regenerate a new key on first invocation of tcp_fastopen_cookie_gen, 
> but in tcp_fastopen_cookie_gen,it didn’t  call tcp_fastopen_init_key_once since
> from commit dfea2aa654243 (tcp: Do not call tcp_fastopen_reset_cipher from interrupt context):
> 
> And in other places where call tcp_fastopen_init_key_once, the ‘publish’ is always true:

Ok, this simplification seems legitimate.

But it is unrelated to this namespacification.  So it should be in a separate patch,
and should be documented well in the commit message using the great explanation you
gave to me above.

Please respin this series, with this patch #2 split up into two changes.

Thank you.

^ permalink raw reply

* [PATCH net-next] net-next/hinic: Fix a case of Tx Queue is Stopped forever
From: Aviad Krawczyk @ 2017-09-26 18:11 UTC (permalink / raw)
  To: davem; +Cc: linux-kernel, netdev, Aviad Krawczyk

Fix the following scenario:
1. tx_free_poll is running on cpu X
2. xmit function is running on cpu Y and fails to get sq wqe
3. tx_free_poll frees wqes on cpu X and checks the queue is not stopped
4. xmit function stops the queue after failed to get sq wqe
5. The queue is stopped forever

Signed-off-by: Aviad Krawczyk <aviad.krawczyk@huawei.com>
---
 drivers/net/ethernet/huawei/hinic/hinic_tx.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/huawei/hinic/hinic_tx.c b/drivers/net/ethernet/huawei/hinic/hinic_tx.c
index abe3e38..9128858 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_tx.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_tx.c
@@ -212,10 +212,19 @@ netdev_tx_t hinic_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
 
 	sq_wqe = hinic_sq_get_wqe(txq->sq, wqe_size, &prod_idx);
 	if (!sq_wqe) {
-		tx_unmap_skb(nic_dev, skb, txq->sges);
-
 		netif_stop_subqueue(netdev, qp->q_id);
 
+		/* Check for the case free_tx_poll is called in another cpu
+		 * and we stopped the subqueue after free_tx_poll check.
+		 */
+		sq_wqe = hinic_sq_get_wqe(txq->sq, wqe_size, &prod_idx);
+		if (sq_wqe) {
+			netif_wake_subqueue(nic_dev->netdev, qp->q_id);
+			goto process_sq_wqe;
+		}
+
+		tx_unmap_skb(nic_dev, skb, txq->sges);
+
 		u64_stats_update_begin(&txq->txq_stats.syncp);
 		txq->txq_stats.tx_busy++;
 		u64_stats_update_end(&txq->txq_stats.syncp);
@@ -223,6 +232,7 @@ netdev_tx_t hinic_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
 		goto flush_skbs;
 	}
 
+process_sq_wqe:
 	hinic_sq_prepare_wqe(txq->sq, prod_idx, sq_wqe, txq->sges, nr_sges);
 
 	hinic_sq_write_wqe(txq->sq, prod_idx, sq_wqe, skb, wqe_size);
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH net] net: dsa: mv88e6xxx: lock mutex when freeing IRQs
From: Andrew Lunn @ 2017-09-26 18:01 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli
In-Reply-To: <20170926174837.15999-1-vivien.didelot@savoirfairelinux.com>

On Tue, Sep 26, 2017 at 01:48:37PM -0400, Vivien Didelot wrote:
> mv88e6xxx_g2_irq_free locks the registers mutex, but not
> mv88e6xxx_g1_irq_free, which results in a stack trace from
> assert_reg_lock when unloading the mv88e6xxx module. Fix this.
> 
> Fixes: 3460a5770ce9 ("net: dsa: mv88e6xxx: Mask g1 interrupts and free interrupt")
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
> ---
>  drivers/net/dsa/mv88e6xxx/chip.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
> index c6678aa9b4ef..b4359e4e5165 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.c
> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> @@ -338,9 +338,11 @@ static void mv88e6xxx_g1_irq_free(struct mv88e6xxx_chip *chip)
>  	int irq, virq;
>  	u16 mask;
>  
> +	mutex_lock(&chip->reg_lock);
>  	mv88e6xxx_g1_read(chip, MV88E6XXX_G1_CTL1, &mask);
>  	mask |= GENMASK(chip->g1_irq.nirqs, 0);
>  	mv88e6xxx_g1_write(chip, MV88E6XXX_G1_CTL1, mask);
> +	mutex_unlock(&chip->reg_lock);
>  
>  	free_irq(chip->irq, chip);
>  

Hi Vivien

static int mv88e6xxx_probe(struct mdio_device *mdiodev)
{

...

out_g1_irq:
	if (chip->irq > 0) {
		mutex_lock(&chip->reg_lock);
		mv88e6xxx_g1_irq_free(chip);
		mutex_unlock(&chip->reg_lock);
	}

It looks like this will deadlock?

In general, i tried to keep the mutex out of the interrupt code.  That
is not totally possible, the IRQ thread handler needs it. But
otherwise, the IRQ code assumes it is called with the mutex taken.

So i think it is better to hold the mutex in mv88e6xxx_remove() when
calling mv88e6xxx_g1_irq_free().

   Andrew

^ permalink raw reply

* Re: [PATCH net] net: dsa: mv88e6xxx: lock mutex when freeing IRQs
From: Vivien Didelot @ 2017-09-26 18:01 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli
In-Reply-To: <20170926180136.GD3325@lunn.ch>

Hi Andrew,

Andrew Lunn <andrew@lunn.ch> writes:

> In general, i tried to keep the mutex out of the interrupt code.  That
> is not totally possible, the IRQ thread handler needs it. But
> otherwise, the IRQ code assumes it is called with the mutex taken.
>
> So i think it is better to hold the mutex in mv88e6xxx_remove() when
> calling mv88e6xxx_g1_irq_free().

I'd prefer that too. Respinning, thanks!


    Vivien

^ permalink raw reply

* Re: [PATCH v2 2/2] ip_tunnel: add mpls over gre encapsulation
From: Amine Kherbouche @ 2017-09-26 17:58 UTC (permalink / raw)
  To: Roopa Prabhu; +Cc: netdev@vger.kernel.org, xeb, David Lamparter
In-Reply-To: <CAJieiUhJCgrykvLNEFaABg6MXKR=FhgbK=bCZF0_QrYFaeG2tw@mail.gmail.com>

Hi Roopa,

Thanks for the feedback, I have just one question:

On 09/26/2017 05:15 PM, Roopa Prabhu wrote:
>> +static int ipgre_tunnel_encap_add_mpls_ops(void)
>> > +{
>> > +       int ret = -1;
>> > +
>> > +#if IS_ENABLED(CONFIG_NET_IP_TUNNEL)
>> > +       ret = ip_tunnel_encap_add_ops(&mpls_iptun_ops, TUNNEL_ENCAP_MPLS);
>> > +#endif
>> > +
>> > +       return ret;
>> > +}
>> > +
>> > +static void ipgre_tunnel_encap_del_mpls_ops(void)
>> > +{
>> > +#if IS_ENABLED(CONFIG_NET_IP_TUNNEL)
>> > +       ip_tunnel_encap_del_ops(&mpls_iptun_ops, TUNNEL_ENCAP_MPLS);
>> > +#endif
>> > +}
>> > +
>> >  static void rtmsg_lfib(int event, u32 label, struct mpls_route *rt,
>> >                        struct nlmsghdr *nlh, struct net *net, u32 portid,
>> >                        unsigned int nlm_flags);
>> > @@ -2486,6 +2521,10 @@ static int __init mpls_init(void)
>> >                       0);
>> >         rtnl_register(PF_MPLS, RTM_GETNETCONF, mpls_netconf_get_devconf,
>> >                       mpls_netconf_dump_devconf, 0);
>> > +       err = ipgre_tunnel_encap_add_mpls_ops();
>> > +       if (err)
>> > +               pr_err("Can't add mpls over gre tunnel ops\n");
>> > +
> This will throw an error  if CONFIG_NET_IP_TUNNEL is not enabled.
> Can you pls put the CONFIG_NET_IP_TUNNEL around
> ipgre_tunnel_encap_add_mpls_ops ?

You want the CONFIG_NET_IP_TUNNEL definition around the declaration of 
the function or when it's called ? or both ? (I can adjust the code for 
any case).

> see CONFIG_INET checks in the rest of af_mpls as example.
> same for del_ops
>

Thanks,

Amine

^ permalink raw reply

* [PATCH net-next] net-next/hinic: Set Rxq irq to specific cpu for NUMA
From: Aviad Krawczyk @ 2017-09-26 17:57 UTC (permalink / raw)
  To: davem; +Cc: linux-kernel, netdev, Aviad Krawczyk

Set Rxq irq to specific cpu for allocating and receiving the skb from
the same node.

Signed-off-by: Aviad Krawczyk <aviad.krawczyk@huawei.com>
---
 drivers/net/ethernet/huawei/hinic/hinic_rx.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/huawei/hinic/hinic_rx.c b/drivers/net/ethernet/huawei/hinic/hinic_rx.c
index 1d4f712..e2e5cdc 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_rx.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_rx.c
@@ -26,6 +26,7 @@
 #include <linux/skbuff.h>
 #include <linux/dma-mapping.h>
 #include <linux/prefetch.h>
+#include <linux/cpumask.h>
 #include <asm/barrier.h>
 
 #include "hinic_common.h"
@@ -171,11 +172,10 @@ static int rx_alloc_pkts(struct hinic_rxq *rxq)
 	struct hinic_sge sge;
 	dma_addr_t dma_addr;
 	struct sk_buff *skb;
-	int i, alloc_more;
 	u16 prod_idx;
+	int i;
 
 	free_wqebbs = hinic_get_rq_free_wqebbs(rxq->rq);
-	alloc_more = 0;
 
 	/* Limit the allocation chunks */
 	if (free_wqebbs > nic_dev->rx_weight)
@@ -185,7 +185,6 @@ static int rx_alloc_pkts(struct hinic_rxq *rxq)
 		skb = rx_alloc_skb(rxq, &dma_addr);
 		if (!skb) {
 			netdev_err(rxq->netdev, "Failed to alloc Rx skb\n");
-			alloc_more = 1;
 			goto skb_out;
 		}
 
@@ -195,7 +194,6 @@ static int rx_alloc_pkts(struct hinic_rxq *rxq)
 					  &prod_idx);
 		if (!rq_wqe) {
 			rx_free_skb(rxq, skb, dma_addr);
-			alloc_more = 1;
 			goto skb_out;
 		}
 
@@ -211,9 +209,7 @@ static int rx_alloc_pkts(struct hinic_rxq *rxq)
 		hinic_rq_update(rxq->rq, prod_idx);
 	}
 
-	if (alloc_more)
-		tasklet_schedule(&rxq->rx_task);
-
+	tasklet_schedule(&rxq->rx_task);
 	return i;
 }
 
@@ -357,7 +353,7 @@ static int rxq_recv(struct hinic_rxq *rxq, int budget)
 	}
 
 	if (pkts)
-		tasklet_schedule(&rxq->rx_task); /* hinic_rx_alloc_pkts */
+		tasklet_schedule(&rxq->rx_task); /* rx_alloc_pkts */
 
 	u64_stats_update_begin(&rxq->rxq_stats.syncp);
 	rxq->rxq_stats.pkts += pkts;
@@ -417,6 +413,8 @@ static int rx_request_irq(struct hinic_rxq *rxq)
 	struct hinic_dev *nic_dev = netdev_priv(rxq->netdev);
 	struct hinic_hwdev *hwdev = nic_dev->hwdev;
 	struct hinic_rq *rq = rxq->rq;
+	struct hinic_qp *qp;
+	struct cpumask mask;
 	int err;
 
 	rx_add_napi(rxq);
@@ -432,7 +430,9 @@ static int rx_request_irq(struct hinic_rxq *rxq)
 		return err;
 	}
 
-	return 0;
+	qp = container_of(rq, struct hinic_qp, rq);
+	cpumask_set_cpu(qp->q_id % num_online_cpus(), &mask);
+	return irq_set_affinity_hint(rq->irq, &mask);
 }
 
 static void rx_free_irq(struct hinic_rxq *rxq)
-- 
1.9.1

^ permalink raw reply related

* [PATCH net] net: dsa: mv88e6xxx: lock mutex when freeing IRQs
From: Vivien Didelot @ 2017-09-26 17:48 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot

mv88e6xxx_g2_irq_free locks the registers mutex, but not
mv88e6xxx_g1_irq_free, which results in a stack trace from
assert_reg_lock when unloading the mv88e6xxx module. Fix this.

Fixes: 3460a5770ce9 ("net: dsa: mv88e6xxx: Mask g1 interrupts and free interrupt")
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
---
 drivers/net/dsa/mv88e6xxx/chip.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index c6678aa9b4ef..b4359e4e5165 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -338,9 +338,11 @@ static void mv88e6xxx_g1_irq_free(struct mv88e6xxx_chip *chip)
 	int irq, virq;
 	u16 mask;
 
+	mutex_lock(&chip->reg_lock);
 	mv88e6xxx_g1_read(chip, MV88E6XXX_G1_CTL1, &mask);
 	mask |= GENMASK(chip->g1_irq.nirqs, 0);
 	mv88e6xxx_g1_write(chip, MV88E6XXX_G1_CTL1, mask);
+	mutex_unlock(&chip->reg_lock);
 
 	free_irq(chip->irq, chip);
 
-- 
2.14.1

^ permalink raw reply related

* Re: [PATCH v2 00/16] Thunderbolt networking
From: Andy Shevchenko @ 2017-09-26 17:37 UTC (permalink / raw)
  To: Mika Westerberg, Greg Kroah-Hartman, David S . Miller
  Cc: Andreas Noever, Michael Jamet, Yehezkel Bernat, Amir Levy,
	Mario.Limonciello, Lukas Wunner, Andrew Lunn, linux-kernel,
	netdev
In-Reply-To: <20170925110738.68382-1-mika.westerberg@linux.intel.com>

On Mon, 2017-09-25 at 14:07 +0300, Mika Westerberg wrote:
> Hi all,
> 
> In addition of tunneling PCIe, Display Port and USB traffic,
> Thunderbolt
> allows connecting two hosts (domains) over a Thunderbolt cable. It is
> possible to tunnel arbitrary data packets over such connection using
> high-speed DMA rings available in the Thunderbolt host controller.
> 
> In order to discover Thunderbolt services the other host supports,
> there is
> a software protocol running on top of the automatically configured
> control
> channel (ring 0). This protocol is called XDomain discovery protocol
> and it
> uses XDomain properties to describe the host (domain) and the services
> it
> supports.
> 
> Once both sides have agreed what services are supported they can
> enable
> high-speed DMA rings to transfer data over the cable.
> 
> This series adds support for the XDomain protocol so that we expose
> each
> remote connection as Thunderbolt XDomain device and each service as
> Thunderbolt service device. On top of that we create an API that
> allows
> writing drivers for these services and finally we provide an example
> Thunderbolt service driver that creates virtual ethernet inferface
> that
> allows tunneling networking packets over Thunderbolt cable. The API
> could
> be used for creating other Thunderbolt services, such as tunneling
> SCSI for
> example.
> 
> The XDomain protocol and networking support is also available in macOS
> and
> Windows so this makes it possible to connect Linux to macOS and
> Windows as
> well.
> 
> The patches are based on previous Thunderbolt networking patch series
> by
> Amir Levy and Michael Jamet, that can be found here:
> 
>   https://lwn.net/Articles/705998/
> 
> The main difference to that patch series is that we have the XDomain
> protocol running in the kernel now so there is no need for a separate
> userspace daemon.
> 
> Note this does not affect the existing functionality, so security
> levels
> and NVM firmware upgrade continue to work as before (with the small
> exception that now sysfs also shows the XDomain connections and
> services in
> addition to normal Thunderbolt devices). It is also possible to
> connect up
> to 5 Thunderbolt devices and then another host, and the network driver
> works exactly the same.
> 
> This is second version of the patch series. The previous version (v1)
> can
> be found here:
> 
>   https://lwn.net/Articles/734019/
> 
> Changes from the v1:
> 
>   * Add include/linux/thunderbolt.h to MAINTAINERS
>   * Correct Linux version and date of new sysfs entries in
>     Documentation/ABI/testing/sysfs-bus-thunderbolt
>   * Move network driver from drivers/thunderbolt/net.c to
>     drivers/net/thunderbolt.c and update it to follow coding style in
>     drivers/net/*.
>   * Add MAINTAINERS entry for the network driver
>   * Minor cleanups
> 
> In case someone wants to try this out, patch [16/16] adds
> documentation how
> the networking driver can be used. In short, if you connect Linux to a
> macOS or Windows, everything is done automatically (as those systems
> have
> the networking service enabled by default). For Linux to Linux
> connection
> one host needs to load the networking driver first (so that the other
> side
> can locate the networking service and load the corresponding driver).


Looks awesome!

Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

(I told privately to Mika about some minors and we agreed with him that
they will be considered only if there will be more comments on the
series)

> 
> Amir Levy (1):
>   net: Add support for networking over Thunderbolt cable
> 
> Mika Westerberg (15):
>   byteorder: Move {cpu_to_be32,be32_to_cpu}_array() from Thunderbolt
> to core
>   thunderbolt: Add support for XDomain properties
>   thunderbolt: Move enum tb_cfg_pkg_type to thunderbolt.h
>   thunderbolt: Move thunderbolt domain structure to thunderbolt.h
>   thunderbolt: Move tb_switch_phy_port_from_link() to thunderbolt.h
>   thunderbolt: Add support for XDomain discovery protocol
>   thunderbolt: Configure interrupt throttling for all interrupts
>   thunderbolt: Add support for frame mode
>   thunderbolt: Export ring handling functions to modules
>   thunderbolt: Move ring descriptor flags to thunderbolt.h
>   thunderbolt: Use spinlock in ring serialization
>   thunderbolt: Use spinlock in NHI serialization
>   thunderbolt: Add polling mode for rings
>   thunderbolt: Add function to retrieve DMA device for the ring
>   thunderbolt: Allocate ring HopID automatically if requested
> 
>  Documentation/ABI/testing/sysfs-bus-thunderbolt |   48 +
>  Documentation/admin-guide/thunderbolt.rst       |   24 +
>  MAINTAINERS                                     |    7 +
>  drivers/net/Kconfig                             |   12 +
>  drivers/net/Makefile                            |    3 +
>  drivers/net/thunderbolt.c                       | 1379
> ++++++++++++++++++++
>  drivers/thunderbolt/Makefile                    |    2 +-
>  drivers/thunderbolt/ctl.c                       |   46 +-
>  drivers/thunderbolt/ctl.h                       |    3 +-
>  drivers/thunderbolt/domain.c                    |  197 ++-
>  drivers/thunderbolt/icm.c                       |  218 +++-
>  drivers/thunderbolt/nhi.c                       |  409 ++++--
>  drivers/thunderbolt/nhi.h                       |  141 +-
>  drivers/thunderbolt/nhi_regs.h                  |   11 +-
>  drivers/thunderbolt/property.c                  |  670 ++++++++++
>  drivers/thunderbolt/switch.c                    |    7 +-
>  drivers/thunderbolt/tb.h                        |   88 +-
>  drivers/thunderbolt/tb_msgs.h                   |  140 +-
>  drivers/thunderbolt/xdomain.c                   | 1576
> +++++++++++++++++++++++
>  include/linux/byteorder/generic.h               |   16 +
>  include/linux/mod_devicetable.h                 |   26 +
>  include/linux/thunderbolt.h                     |  598 +++++++++
>  scripts/mod/devicetable-offsets.c               |    7 +
>  scripts/mod/file2alias.c                        |   25 +
>  24 files changed, 5304 insertions(+), 349 deletions(-)
>  create mode 100644 drivers/net/thunderbolt.c
>  create mode 100644 drivers/thunderbolt/property.c
>  create mode 100644 drivers/thunderbolt/xdomain.c
>  create mode 100644 include/linux/thunderbolt.h
> 

-- 
Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Intel Finland Oy

^ permalink raw reply

* Re: [PATCH net v3] l2tp: fix race condition in l2tp_tunnel_delete
From: David Miller @ 2017-09-26 17:34 UTC (permalink / raw)
  To: sd; +Cc: netdev, g.nault, lucien.xin, tparkin
In-Reply-To: <d6011f741f7ea38a61006b9f2cde8e544b4d206b.1506432922.git.sd@queasysnail.net>

From: Sabrina Dubroca <sd@queasysnail.net>
Date: Tue, 26 Sep 2017 16:16:43 +0200

> If we try to delete the same tunnel twice, the first delete operation
> does a lookup (l2tp_tunnel_get), finds the tunnel, calls
> l2tp_tunnel_delete, which queues it for deletion by
> l2tp_tunnel_del_work.
> 
> The second delete operation also finds the tunnel and calls
> l2tp_tunnel_delete. If the workqueue has already fired and started
> running l2tp_tunnel_del_work, then l2tp_tunnel_delete will queue the
> same tunnel a second time, and try to free the socket again.
> 
> Add a dead flag to prevent firing the workqueue twice. Then we can
> remove the check of queue_work's result that was meant to prevent that
> race but doesn't.
> 
> Reproducer:
> 
>     ip l2tp add tunnel tunnel_id 3000 peer_tunnel_id 4000 local 192.168.0.2 remote 192.168.0.1 encap udp udp_sport 5000 udp_dport 6000
>     ip l2tp add session name l2tp1 tunnel_id 3000 session_id 1000 peer_session_id 2000
>     ip link set l2tp1 up
>     ip l2tp del tunnel tunnel_id 3000
>     ip l2tp del tunnel tunnel_id 3000
> 
> Fixes: f8ccac0e4493 ("l2tp: put tunnel socket release on a workqueue")
> Reported-by: Jianlin Shi <jishi@redhat.com>
> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH] net: stmmac: Meet alignment requirements for DMA
From: David Miller @ 2017-09-26 17:34 UTC (permalink / raw)
  To: matt.redfearn
  Cc: netdev, alexandre.torgue, peppe.cavallaro, linux-kernel,
	linux-mips, paul.burton, james.hogan
In-Reply-To: <587dc9a8-b974-e222-95b4-93c2a8f2aba2@imgtec.com>

From: Matt Redfearn <matt.redfearn@imgtec.com>
Date: Tue, 26 Sep 2017 14:57:39 +0100

> Since the MIPS architecture requires software cache management,
> performing a dma_map_single(TO_DEVICE) will writeback and invalidate
> the cachelines for the required region. To comply with (our
> interpretation of) the DMA API documentation, the MIPS implementation
> expects a cacheline aligned & sized buffer, so that it can writeback a
> set of complete cachelines. Indeed a recent patch
> (https://patchwork.linux-mips.org/patch/14624/) causes the MIPS dma
> mapping code to emit warnings if the buffer it is requested to sync is
> not cachline aligned. This driver, as is, fails this test and causes
> thousands of warnings to be emitted.

For a device write, if the device does what it is told and does not
access bytes outside of the periphery of the DMA mapping, nothing bad
can happen.

When the DMA buffer is mapped, the cpu side cachelines are flushed (even
ones which are partially part of the requsted mapping, this is _fine_).

The device writes into only the bytes it was given access to, which
starts at the DMA address.

The unmap and/or sync operation after the DMA transfer needs to do
nothing on the cpu side since the map operation flushed the cpu side
caches already.

When the cpu reads, it will pull the cacheline from main memory and
see what the device wrote.

It's not like the device can put "garbage" into the bytes earlier in
the cacheline it was given partial access to.

A device read is similar, the cpu cachelines are written out to main
memory at map time and the device sees what it needs to see.

I want to be shown a real example where corruption or bad data can
occur when the DMA API is used correctly.

Other platforms write "complete cachelines" in order to implement
the cpu and/or host controller parts of the DMA API implementation.
I see nothing special about MIPS at all.

If you're given a partial cache line, you align the addresses such
that you flush every cache line contained within the provided address
range.

I really don't see what the problem is and why MIPS needs special
handling.  You will have to give specific examples, step by step,
where things can go wrong before I will be able to consider your
changes.

Thanks.

^ permalink raw reply

* Re: [PATCH net-next] ldmvsw: Remove redundant unlikely()
From: David Miller @ 2017-09-26 17:23 UTC (permalink / raw)
  To: tklauser; +Cc: netdev, shannon.nelson
In-Reply-To: <20170926131415.8649-1-tklauser@distanz.ch>

From: Tobias Klauser <tklauser@distanz.ch>
Date: Tue, 26 Sep 2017 15:14:15 +0200

> IS_ERR() already implies unlikely(), so it can be omitted.
> 
> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net/mlx5: Remove redundant unlikely()
From: David Miller @ 2017-09-26 17:23 UTC (permalink / raw)
  To: tklauser-93Khv+1bN0NyDzI6CaY1VQ
  Cc: saeedm-VPRAkNaXOzVWk0Htik3J/w, matanb-VPRAkNaXOzVWk0Htik3J/w,
	leonro-VPRAkNaXOzVWk0Htik3J/w, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20170926131323.8527-1-tklauser-93Khv+1bN0NyDzI6CaY1VQ@public.gmane.org>

From: Tobias Klauser <tklauser-93Khv+1bN0NyDzI6CaY1VQ@public.gmane.org>
Date: Tue, 26 Sep 2017 15:13:23 +0200

> IS_ERR() already implies unlikely(), so it can be omitted.
> 
> Signed-off-by: Tobias Klauser <tklauser-93Khv+1bN0NyDzI6CaY1VQ@public.gmane.org>

Applied.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next] bnxt_en: Remove redundant unlikely()
From: David Miller @ 2017-09-26 17:23 UTC (permalink / raw)
  To: tklauser; +Cc: michael.chan, netdev
In-Reply-To: <20170926131226.8423-1-tklauser@distanz.ch>

From: Tobias Klauser <tklauser@distanz.ch>
Date: Tue, 26 Sep 2017 15:12:26 +0200

> IS_ERR() already implies unlikely(), so it can be omitted.
> 
> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access
From: Andy Gospodarek @ 2017-09-26 17:21 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: davem, alexei.starovoitov, john.fastabend, peter.waskiewicz.jr,
	jakub.kicinski, netdev, mchan
In-Reply-To: <59C94FF4.8070900@iogearbox.net>

On Mon, Sep 25, 2017 at 08:50:28PM +0200, Daniel Borkmann wrote:
> On 09/25/2017 08:10 PM, Andy Gospodarek wrote:
> [...]
> > First, thanks for this detailed description.  It was helpful to read
> > along with the patches.
> > 
> > My only concern about this area being generic is that you are now in a
> > state where any bpf program must know about all the bpf programs in the
> > receive pipeline before it can properly parse what is stored in the
> > meta-data and add it to an skb (or perform any other action).
> > Especially if each program adds it's own meta-data along the way.
> > 
> > Maybe this isn't a big concern based on the number of users of this
> > today, but it just starts to seem like a concern as there are these
> > hints being passed between layers that are challenging to track due to a
> > lack of a standard format for passing data between.
> 
> Btw, we do have similar kind of programmable scratch buffer also today
> wrt skb cb[] that you can program from tc side, the perf ring buffer,
> which doesn't have any fixed layout for the slots, or a per-cpu map
> where you can transfer data between tail calls for example, then tail
> calls themselves that need to coordinate, or simply mangling of packets
> itself if you will, but more below to your use case ...
> 
> > The main reason I bring this up is that Michael and I had discussed and
> > designed a way for drivers to communicate between each other that rx
> > resources could be freed after a tx completion on an XDP_REDIRECT
> > action.  Much like this code, it involved adding an new element to
> > struct xdp_md that could point to the important information.  Now that
> > there is a generic way to handle this, it would seem nice to be able to
> > leverage it, but I'm not sure how reliable this meta-data area would be
> > without the ability to mark it in some manner.
> > 
> > For additional background, the minimum amount of data needed in the case
> > Michael and I were discussing was really 2 words.  One to serve as a
> > pointer to an rx_ring structure and one to have a counter to the rx
> > producer entry.  This data could be acessed by the driver processing the
> > tx completions and callback to the driver that received the frame off the wire
> > to perform any needed processing.  (For those curious this would also require a
> > new callback/netdev op to act on this data stored in the XDP buffer.)
> 
> What you describe above doesn't seem to be fitting to the use-case of
> this set, meaning the area here is fully programmable out of the BPF
> program, the infrastructure you're describing is some sort of means of
> communication between drivers for the XDP_REDIRECT, and should be
> outside of the control of the BPF program to mangle.

OK, I understand that perspective.  I think saying this is really meant
as a BPF<->BPF communication channel for now is fine.

> You could probably reuse the base infra here and make a part of that
> inaccessible for the program with some sort of a fixed layout, but I
> haven't seen your code yet to be able to fully judge. Intention here
> is to allow for programmability within the BPF prog in a generic way,
> such that based on the use-case it can be populated in specific ways
> and propagated to the skb w/o having to define a fixed layout and
> bloat xdp_buff all the way to an skb while still retaining all the
> flexibility.

Some level of reuse might be proper, but I'd rather it be explicit for
my use since it's not exclusively something that will need to be used by
a BPF prog, but rather the driver.  I'll produce some patches this week
for reference.

^ permalink raw reply

* Re: [PATCH] vti: fix use after free in vti_tunnel_xmit/vti6_tnl_xmit
From: David Miller @ 2017-09-26 17:13 UTC (permalink / raw)
  To: alexey.kodanev; +Cc: netdev, steffen.klassert, herbert
In-Reply-To: <1506428069-4716-1-git-send-email-alexey.kodanev@oracle.com>

From: Alexey Kodanev <alexey.kodanev@oracle.com>
Date: Tue, 26 Sep 2017 15:14:29 +0300

> When running LTP IPsec tests, KASan might report:
> 
> BUG: KASAN: use-after-free in vti_tunnel_xmit+0xeee/0xff0 [ip_vti]
> Read of size 4 at addr ffff880dc6ad1980 by task swapper/0/0
> ...
 ...
> Can be fixed if we get skb->len before dst_output().
> 
> Fixes: b9959fd3b0fa ("vti: switch to new ip tunnel code")
> Fixes: 22e1b23dafa8 ("vti6: Support inter address family tunneling.")
> Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>

Applied and queued up for -stable.

^ permalink raw reply

* [iproute PATCH v2 2/3] tc: flower: No need to cache indev arg
From: Phil Sutter @ 2017-09-26 16:35 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20170926163548.24347-1-phil@nwl.cc>

Since addattrstrz() will copy the provided string into the attribute
payload, there is no need to cache the data.

Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 tc/f_flower.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/tc/f_flower.c b/tc/f_flower.c
index 934832e2bbe90..99e62a382dec6 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -629,11 +629,8 @@ static int flower_parse_opt(struct filter_util *qu, char *handle,
 		} else if (matches(*argv, "skip_sw") == 0) {
 			flags |= TCA_CLS_FLAGS_SKIP_SW;
 		} else if (matches(*argv, "indev") == 0) {
-			char ifname[IFNAMSIZ] = {};
-
 			NEXT_ARG();
-			strncpy(ifname, *argv, sizeof(ifname) - 1);
-			addattrstrz(n, MAX_MSG, TCA_FLOWER_INDEV, ifname);
+			addattrstrz(n, MAX_MSG, TCA_FLOWER_INDEV, *argv);
 		} else if (matches(*argv, "vlan_id") == 0) {
 			__u16 vid;
 
-- 
2.13.1

^ permalink raw reply related

* Re: [PATCH net-next] kcm: Remove redundant unlikely()
From: David Miller @ 2017-09-26 16:54 UTC (permalink / raw)
  To: tklauser; +Cc: netdev
In-Reply-To: <20170926092258.21391-1-tklauser@distanz.ch>

From: Tobias Klauser <tklauser@distanz.ch>
Date: Tue, 26 Sep 2017 11:22:58 +0200

> IS_ERR() already implies unlikely(), so it can be omitted.
> 
> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>

Applied.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox