Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net 07/13] net: sched: cls_u32: get rid of tc_u_common ->rcu
From: Jamal Hadi Salim @ 2018-09-09 11:45 UTC (permalink / raw)
  To: Al Viro, netdev; +Cc: Cong Wang, Jiri Pirko
In-Reply-To: <20180909013132.3222-8-viro@ZenIV.linux.org.uk>

On 2018-09-08 9:31 p.m., Al Viro wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> unused
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal

^ permalink raw reply

* Re: [PATCH net 06/13] net: sched: cls_u32: get rid of tc_u_knode ->tp
From: Jamal Hadi Salim @ 2018-09-09 11:43 UTC (permalink / raw)
  To: Al Viro, netdev; +Cc: Cong Wang, Jiri Pirko
In-Reply-To: <20180909013132.3222-7-viro@ZenIV.linux.org.uk>

On 2018-09-08 9:31 p.m., Al Viro wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> not used anymore
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal

^ permalink raw reply

* Re: [PATCH net 05/13] net: sched: cls_u32: get rid of unused argument of u32_destroy_key()
From: Jamal Hadi Salim @ 2018-09-09 11:41 UTC (permalink / raw)
  To: Al Viro, netdev; +Cc: Cong Wang, Jiri Pirko
In-Reply-To: <20180909013132.3222-6-viro@ZenIV.linux.org.uk>

On 2018-09-08 9:31 p.m., Al Viro wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal

^ permalink raw reply

* Re: [PATCH net 04/13] net: sched: cls_u32: make sure that divisor is a power of 2
From: Jamal Hadi Salim @ 2018-09-09 11:41 UTC (permalink / raw)
  To: Al Viro, netdev; +Cc: Cong Wang, Jiri Pirko
In-Reply-To: <20180909013132.3222-5-viro@ZenIV.linux.org.uk>

On 2018-09-08 9:31 p.m., Al Viro wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>  > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Test is by hacking user space iproute2 to allow
sending a divisor > 255

Tested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal

^ permalink raw reply

* Re: [PATCH net 03/13] net: sched: cls_u32: disallow linking to root hnode
From: Jamal Hadi Salim @ 2018-09-09 11:39 UTC (permalink / raw)
  To: Al Viro, netdev; +Cc: Cong Wang, Jiri Pirko
In-Reply-To: <20180909013132.3222-4-viro@ZenIV.linux.org.uk>

On 2018-09-08 9:31 p.m., Al Viro wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> Operation makes no sense.  Nothing will actually break if we do so
> (depth limit in u32_classify() will prevent infinite loops), but
> according to maintainers it's best prohibited outright.
> 
> NOTE: doing so guarantees that u32_destroy() will trigger the call
> of u32_destroy_hnode(); we might want to make that unconditional.
> 
> Test:
> tc qdisc add dev eth0 ingress
> tc filter add dev eth0 parent ffff: protocol ip prio 100 u32 \
> 	link 800: offset at 0 mask 0f00 shift 6 plus 0 eat match ip protocol 6 ff
> should fail with
> Error: cls_u32: Not linking to root node
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Tested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal

^ permalink raw reply

* Re: [PATCH net 02/13] net: sched: cls_u32: mark root hnode explicitly
From: Jamal Hadi Salim @ 2018-09-09 11:39 UTC (permalink / raw)
  To: Al Viro, netdev; +Cc: Cong Wang, Jiri Pirko
In-Reply-To: <20180909013132.3222-3-viro@ZenIV.linux.org.uk>

On 2018-09-08 9:31 p.m., Al Viro wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal

^ permalink raw reply

* Re: [PATCH] r8169: inform about CLKRUN protocol issue when behind a CardBus bridge
From: David Miller @ 2018-09-09 15:09 UTC (permalink / raw)
  To: mail; +Cc: nic_swsd, netdev, linux-kernel
In-Reply-To: <fcf6a23a-4721-5912-281e-488eb6a736c6@maciej.szmigiero.name>

From: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name>
Date: Thu, 6 Sep 2018 18:10:53 +0200

> It turns out that at least some r8169 CardBus cards don't operate correctly
> when CLKRUN protocol is enabled - the symptoms are recurring timeouts
> during PHY reads / writes and a very high packet drop rate.
> This is true of at least RTL8169sc/8110sc (XID 18000000) chip in
> Sunrich C-160 CardBus NIC.
> 
> Such behavior was observed on two separate laptops, the first one has
> TI PCIxx12 CardBus bridge, while the second one has Ricoh RL5c476II.
> 
> Setting CLKRUN_En bit in CONFIG 3 register via an EEPROM write didn't
> improve things in either case (this is probably why it wasn't set by the
> card manufacturer).
> The only way to fix the issue was to disable the CLKRUN protocol either
> in the CardBus bridge (only possible in the TI one) or in the southbridge.
> 
> Since the problem takes some time to debug let's warn people that have
> the suspect configuration (Conventional PCI r8169 NIC behind a CardBus
> bridge) so they know what they can do if they encounter it.
> 
> Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>

I don't know about this.

Barking at the user in the kernel log about an obscure knob (which btw
doesn't exist for all cardbus bridges without other patches you are
posting elsewhere) is rarely effective.

We should just disable clkrun automatically we know it causes problems.

Sorry, I don't think this is that right approach and therefore I am not
applying this.

^ permalink raw reply

* Re: [PATCH net-next v9 0/6] net: vhost: improve performance when enable busyloop
From: David Miller @ 2018-09-09 15:01 UTC (permalink / raw)
  To: xiangxia.m.yue; +Cc: netdev, virtualization, mst
In-Reply-To: <1536493887-2637-1-git-send-email-xiangxia.m.yue@gmail.com>

From: xiangxia.m.yue@gmail.com
Date: Sun,  9 Sep 2018 04:51:21 -0700

> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> 
> This patches improve the guest receive performance.
> On the handle_tx side, we poll the sock receive queue
> at the same time. handle_rx do that in the same way.
> 
> For more performance report, see patch 4, 5, 6

I only see patches 1-4.

^ permalink raw reply

* Re: [PATCH net-next 13/15] net: Add and use skb_list_del_init().
From: Sergei Shtylyov @ 2018-09-09  9:45 UTC (permalink / raw)
  To: David Miller, netdev
In-Reply-To: <20180908.131104.39421839190166239.davem@davemloft.net>

Hello!

On 9/8/2018 11:11 PM, David Miller wrote:

> It documents what is happening, and eliminates the spurious list
> pointer poisoning.
> 
> In the long term, in order to get proper list head debugging, we
> might want to use the list poinson value as the indicator that

    Poison?

> an SKB is a singleton and not on a list.
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>

[...]

MBR, Sergei

^ permalink raw reply

* Re: Overlayfs @ Containers and checkpoint/restart micro-conference at LPC2018
From: Christian Brauner @ 2018-09-09  9:18 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Stephane Graber, containers, Miklos Szeredi, Netdev, overlayfs,
	lxc-users, LSM List, lxc-devel, linux-fsdevel, zhangyi (F)
In-Reply-To: <CAOQ4uxih7rPPLquJZFL9HtSbUFi-cCOR21qH4_-ORxxLKGsLXg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4628 bytes --]

On Sun, Sep 09, 2018 at 09:31:02AM +0300, Amir Goldstein wrote:
> On Sun, Sep 9, 2018 at 4:31 AM Christian Brauner <christian@brauner.io> wrote:
> >
> ...
> > > [cc: overlayfs developers]
> > >
> > > Hi Stéphane!
> >
> > Hey Amir,
> >
> > I'm one of the co-organizers of the microconf.
> >
> > >
> > > I am not planing to travel to LPC this year, so this is more of an FYI than
> > > a CFP, but maybe another overlayfs developer can pick up this glove??
> >
> > Sure, that would be great.
> >
> > >
> > > For the past two years I have participated in the effort to fix overlayfs
> > > "non-standard" behavior:
> > > https://github.com/amir73il/overlayfs/wiki/Overlayfs-non-standard-behavior
> >
> > Yes, this is an issue that we were aware of for a long time and it
> > something that has made overlayfs somewhat more difficult to use than it
> > should be.
> >
> > >
> > > Allegedly, this effort went underway to improve the experience of overlayfs
> > > users, who are mostly applications running inside containers. For backward
> > > compatibility reasons, container runtimes will need to opt-in for fixing some
> > > of the legacy behavior.
> > >
> > > In reality, I have seen very little cross list interaction between linux-unionfs
> > > and containers mailing lists. The only interaction I recall in the
> > > past two years
> > > ended up in a fix in overlayfs to require opt-in for fixing yet another backward
> > > compatible bad behavior, although docker did follow up shortly after to fix
> > > bad practice in container runtime:
> > > https://github.com/moby/moby/issues/34672
> > >
> > > So the questions I would like to relay to the micro-conf participants w.r.t the
> > > new opt-in overlayfs behavior:
> > > 1. Did you know?
> >
> > I personally did not know about the new opt-in behavior. More reason to
> > give a talk! :)
> >
> > > 2. Do you care?
> >
> > Yes, we do care. However - speaking as LXC upstream now - we have
> > recently focused on getting shiftfs to work rather than overlayfs.
> >
> 
> IMO, as I expressed it in the past, the fact that shiftfs development is not
> collaborated with overlayfs developers is a pitty.
> Yes shiftfs has a different purpose than overlayfs, but they have common
> use cases and common problems as well.

My team hast just started to be more involved with shifts development a
few months back. Overlayfs is definitely an inspiration and we even once
thought about making shifts an extension of overlayfs.
Seth Forshee on my team is currently actively working on shifts and
getting a POC ready.
When he has a POC based on James' patchset there will be an RFC that
will go to fsdevel and all parties of interest.
There will also be an update on shifts development during the microconf.
So even more reason for developers from overlayfs to stop by.

> 
> > We are more than happy to have a overlayfs talk at the microconf. If
> > someone were to talk about:
> > - What non-standard behavior has already been fixed?
> > - How has it been fixed?
> 
> IMO, those questions are covered quite well by the wiki and overlayfs.txt
> documentation in kernel tree.

It's still worth bringing this in front of other developers in the form
of a talk.

> 
> > - What non-standard behavior still needs to be fixed?
> 
> There's the mmap MAP_SHARED case covered in the wiki
> and there may be other small stuff, but not sure if anyone cares
> about them, so the question should really be directed back to the audience...

The audience that cares enough to send patches for it will likely be at
the microconf so it's a good place to discuss it.

> 
> > - Outstanding problems that either still need a solution or
> >   are solved but one would like feedback on the implementation. This way
> >   we can have a good discussion.
> >
> 
> I think one of the chsallange that distros and container runtime will need to
> deal with is managing format versions of overlay "images".
> The reason the new features require user or distro to opt-in is because
> the new features create overlayfs images that are not fully compatible with old
> kernels and existing container image tools (i.e. export/migrate image).

As I said we will be very thankful for any talk about such problems.

Thanks!
Christian

> 
> The new overlayfs-progs project by Zhangyi is going to help in that respect:
> https://github.com/hisilicon/overlayfs-progs
> As well as Zhangyi's work on overlayfs feature set support:
> https://marc.info/?l=linux-unionfs&m=153302911328159&w=2
> 
> Thanks,
> Amir.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH net V2 3/7] net: ena: fix device destruction to gracefully free resources
From: netanel @ 2018-09-09  8:15 UTC (permalink / raw)
  To: davem, netdev
  Cc: Netanel Belgazal, dwmw, zorik, matua, saeedb, msw, aliguori,
	nafea, evgenys, gtzalik
In-Reply-To: <20180909081526.83979-1-netanel@amazon.com>

From: Netanel Belgazal <netanel@amazon.com>

When ena_destroy_device() is called from ena_suspend(), the device is
still reachable from the driver. Therefore, the driver can send a command
to the device to free all resources.
However, in all other cases of calling ena_destroy_device(), the device is
potentially in an error state and unreachable from the driver. In these
cases the driver must not send commands to the device.

The current implementation does not request resource freeing from the
device even when possible. We add the graceful parameter to
ena_destroy_device() to enable resource freeing when possible, and
use it in ena_suspend().

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 69e684fd2787..035d47d2179a 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -76,7 +76,7 @@ MODULE_DEVICE_TABLE(pci, ena_pci_tbl);
 
 static int ena_rss_init_default(struct ena_adapter *adapter);
 static void check_for_admin_com_state(struct ena_adapter *adapter);
-static void ena_destroy_device(struct ena_adapter *adapter);
+static void ena_destroy_device(struct ena_adapter *adapter, bool graceful);
 static int ena_restore_device(struct ena_adapter *adapter);
 
 static void ena_tx_timeout(struct net_device *dev)
@@ -1900,7 +1900,7 @@ static int ena_close(struct net_device *netdev)
 			  "Destroy failure, restarting device\n");
 		ena_dump_stats_to_dmesg(adapter);
 		/* rtnl lock already obtained in dev_ioctl() layer */
-		ena_destroy_device(adapter);
+		ena_destroy_device(adapter, false);
 		ena_restore_device(adapter);
 	}
 
@@ -2550,7 +2550,7 @@ static int ena_enable_msix_and_set_admin_interrupts(struct ena_adapter *adapter,
 	return rc;
 }
 
-static void ena_destroy_device(struct ena_adapter *adapter)
+static void ena_destroy_device(struct ena_adapter *adapter, bool graceful)
 {
 	struct net_device *netdev = adapter->netdev;
 	struct ena_com_dev *ena_dev = adapter->ena_dev;
@@ -2563,7 +2563,8 @@ static void ena_destroy_device(struct ena_adapter *adapter)
 	dev_up = test_bit(ENA_FLAG_DEV_UP, &adapter->flags);
 	adapter->dev_up_before_reset = dev_up;
 
-	ena_com_set_admin_running_state(ena_dev, false);
+	if (!graceful)
+		ena_com_set_admin_running_state(ena_dev, false);
 
 	if (test_bit(ENA_FLAG_DEV_UP, &adapter->flags))
 		ena_down(adapter);
@@ -2665,7 +2666,7 @@ static void ena_fw_reset_device(struct work_struct *work)
 		return;
 	}
 	rtnl_lock();
-	ena_destroy_device(adapter);
+	ena_destroy_device(adapter, false);
 	ena_restore_device(adapter);
 	rtnl_unlock();
 }
@@ -3467,7 +3468,7 @@ static int ena_suspend(struct pci_dev *pdev,  pm_message_t state)
 			"ignoring device reset request as the device is being suspended\n");
 		clear_bit(ENA_FLAG_TRIGGER_RESET, &adapter->flags);
 	}
-	ena_destroy_device(adapter);
+	ena_destroy_device(adapter, true);
 	rtnl_unlock();
 	return 0;
 }
-- 
2.15.2.AMZN

^ permalink raw reply related

* [PATCH net V2 0/7] bug fixes for ENA Ethernet driver
From: netanel @ 2018-09-09  8:15 UTC (permalink / raw)
  To: davem, netdev
  Cc: Netanel Belgazal, dwmw, zorik, matua, saeedb, msw, aliguori,
	nafea, evgenys, gtzalik

From: Netanel Belgazal <netanel@amazon.com>

Netanel Belgazal (7):
  net: ena: fix surprise unplug NULL dereference kernel crash
  net: ena: fix driver when PAGE_SIZE == 64kB
  net: ena: fix device destruction to gracefully free resources
  net: ena: fix potential double ena_destroy_device()
  net: ena: fix missing lock during device destruction
  net: ena: fix missing calls to READ_ONCE
  net: ena: fix incorrect usage of memory barriers

 drivers/net/ethernet/amazon/ena/ena_com.c     | 24 ++++----
 drivers/net/ethernet/amazon/ena/ena_eth_com.c |  6 ++
 drivers/net/ethernet/amazon/ena/ena_eth_com.h |  8 +--
 drivers/net/ethernet/amazon/ena/ena_netdev.c  | 82 ++++++++++++---------------
 drivers/net/ethernet/amazon/ena/ena_netdev.h  | 11 ++++
 5 files changed, 67 insertions(+), 64 deletions(-)

-- 
2.15.2.AMZN

^ permalink raw reply

* [PATCH net V2 6/7] net: ena: fix missing calls to READ_ONCE
From: netanel @ 2018-09-09  8:15 UTC (permalink / raw)
  To: davem, netdev
  Cc: Netanel Belgazal, dwmw, zorik, matua, saeedb, msw, aliguori,
	nafea, evgenys, gtzalik
In-Reply-To: <20180909081526.83979-1-netanel@amazon.com>

From: Netanel Belgazal <netanel@amazon.com>

Add READ_ONCE calls where necessary (for example when iterating
over a memory field that gets updated by the hardware).

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_com.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c b/drivers/net/ethernet/amazon/ena/ena_com.c
index 17f12c18d225..c37deef3bcf1 100644
--- a/drivers/net/ethernet/amazon/ena/ena_com.c
+++ b/drivers/net/ethernet/amazon/ena/ena_com.c
@@ -459,7 +459,7 @@ static void ena_com_handle_admin_completion(struct ena_com_admin_queue *admin_qu
 	cqe = &admin_queue->cq.entries[head_masked];
 
 	/* Go over all the completions */
-	while ((cqe->acq_common_descriptor.flags &
+	while ((READ_ONCE(cqe->acq_common_descriptor.flags) &
 			ENA_ADMIN_ACQ_COMMON_DESC_PHASE_MASK) == phase) {
 		/* Do not read the rest of the completion entry before the
 		 * phase bit was validated
@@ -637,7 +637,7 @@ static u32 ena_com_reg_bar_read32(struct ena_com_dev *ena_dev, u16 offset)
 
 	mmiowb();
 	for (i = 0; i < timeout; i++) {
-		if (read_resp->req_id == mmio_read->seq_num)
+		if (READ_ONCE(read_resp->req_id) == mmio_read->seq_num)
 			break;
 
 		udelay(1);
@@ -1796,8 +1796,8 @@ void ena_com_aenq_intr_handler(struct ena_com_dev *dev, void *data)
 	aenq_common = &aenq_e->aenq_common_desc;
 
 	/* Go over all the events */
-	while ((aenq_common->flags & ENA_ADMIN_AENQ_COMMON_DESC_PHASE_MASK) ==
-	       phase) {
+	while ((READ_ONCE(aenq_common->flags) &
+		ENA_ADMIN_AENQ_COMMON_DESC_PHASE_MASK) == phase) {
 		pr_debug("AENQ! Group[%x] Syndrom[%x] timestamp: [%llus]\n",
 			 aenq_common->group, aenq_common->syndrom,
 			 (u64)aenq_common->timestamp_low +
-- 
2.15.2.AMZN

^ permalink raw reply related

* [PATCH net V2 7/7] net: ena: fix incorrect usage of memory barriers
From: netanel @ 2018-09-09  8:15 UTC (permalink / raw)
  To: davem, netdev
  Cc: Netanel Belgazal, dwmw, zorik, matua, saeedb, msw, aliguori,
	nafea, evgenys, gtzalik
In-Reply-To: <20180909081526.83979-1-netanel@amazon.com>

From: Netanel Belgazal <netanel@amazon.com>

Added memory barriers where they were missing to support multiple
architectures, and removed redundant ones.

As part of removing the redundant memory barriers and improving
performance, we moved to more relaxed versions of memory barriers,
as well as to the more relaxed version of writel - writel_relaxed,
while maintaining correctness.

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_com.c     | 16 +++++++-------
 drivers/net/ethernet/amazon/ena/ena_eth_com.c |  6 ++++++
 drivers/net/ethernet/amazon/ena/ena_eth_com.h |  8 ++-----
 drivers/net/ethernet/amazon/ena/ena_netdev.c  | 30 ++++++++++-----------------
 4 files changed, 26 insertions(+), 34 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c b/drivers/net/ethernet/amazon/ena/ena_com.c
index c37deef3bcf1..7635c38e77dd 100644
--- a/drivers/net/ethernet/amazon/ena/ena_com.c
+++ b/drivers/net/ethernet/amazon/ena/ena_com.c
@@ -464,7 +464,7 @@ static void ena_com_handle_admin_completion(struct ena_com_admin_queue *admin_qu
 		/* Do not read the rest of the completion entry before the
 		 * phase bit was validated
 		 */
-		rmb();
+		dma_rmb();
 		ena_com_handle_single_admin_completion(admin_queue, cqe);
 
 		head_masked++;
@@ -627,15 +627,8 @@ static u32 ena_com_reg_bar_read32(struct ena_com_dev *ena_dev, u16 offset)
 	mmio_read_reg |= mmio_read->seq_num &
 			ENA_REGS_MMIO_REG_READ_REQ_ID_MASK;
 
-	/* make sure read_resp->req_id get updated before the hw can write
-	 * there
-	 */
-	wmb();
-
-	writel_relaxed(mmio_read_reg,
-		       ena_dev->reg_bar + ENA_REGS_MMIO_REG_READ_OFF);
+	writel(mmio_read_reg, ena_dev->reg_bar + ENA_REGS_MMIO_REG_READ_OFF);
 
-	mmiowb();
 	for (i = 0; i < timeout; i++) {
 		if (READ_ONCE(read_resp->req_id) == mmio_read->seq_num)
 			break;
@@ -1798,6 +1791,11 @@ void ena_com_aenq_intr_handler(struct ena_com_dev *dev, void *data)
 	/* Go over all the events */
 	while ((READ_ONCE(aenq_common->flags) &
 		ENA_ADMIN_AENQ_COMMON_DESC_PHASE_MASK) == phase) {
+		/* Make sure the phase bit (ownership) is as expected before
+		 * reading the rest of the descriptor.
+		 */
+		dma_rmb();
+
 		pr_debug("AENQ! Group[%x] Syndrom[%x] timestamp: [%llus]\n",
 			 aenq_common->group, aenq_common->syndrom,
 			 (u64)aenq_common->timestamp_low +
diff --git a/drivers/net/ethernet/amazon/ena/ena_eth_com.c b/drivers/net/ethernet/amazon/ena/ena_eth_com.c
index ea149c134e15..1c682b76190f 100644
--- a/drivers/net/ethernet/amazon/ena/ena_eth_com.c
+++ b/drivers/net/ethernet/amazon/ena/ena_eth_com.c
@@ -51,6 +51,11 @@ static inline struct ena_eth_io_rx_cdesc_base *ena_com_get_next_rx_cdesc(
 	if (desc_phase != expected_phase)
 		return NULL;
 
+	/* Make sure we read the rest of the descriptor after the phase bit
+	 * has been read
+	 */
+	dma_rmb();
+
 	return cdesc;
 }
 
@@ -493,6 +498,7 @@ int ena_com_tx_comp_req_id_get(struct ena_com_io_cq *io_cq, u16 *req_id)
 	if (cdesc_phase != expected_phase)
 		return -EAGAIN;
 
+	dma_rmb();
 	if (unlikely(cdesc->req_id >= io_cq->q_depth)) {
 		pr_err("Invalid req id %d\n", cdesc->req_id);
 		return -EINVAL;
diff --git a/drivers/net/ethernet/amazon/ena/ena_eth_com.h b/drivers/net/ethernet/amazon/ena/ena_eth_com.h
index 6fdc753d9483..2f7657227cfe 100644
--- a/drivers/net/ethernet/amazon/ena/ena_eth_com.h
+++ b/drivers/net/ethernet/amazon/ena/ena_eth_com.h
@@ -107,8 +107,7 @@ static inline int ena_com_sq_empty_space(struct ena_com_io_sq *io_sq)
 	return io_sq->q_depth - 1 - cnt;
 }
 
-static inline int ena_com_write_sq_doorbell(struct ena_com_io_sq *io_sq,
-					    bool relaxed)
+static inline int ena_com_write_sq_doorbell(struct ena_com_io_sq *io_sq)
 {
 	u16 tail;
 
@@ -117,10 +116,7 @@ static inline int ena_com_write_sq_doorbell(struct ena_com_io_sq *io_sq,
 	pr_debug("write submission queue doorbell for queue: %d tail: %d\n",
 		 io_sq->qid, tail);
 
-	if (relaxed)
-		writel_relaxed(tail, io_sq->db_addr);
-	else
-		writel(tail, io_sq->db_addr);
+	writel(tail, io_sq->db_addr);
 
 	return 0;
 }
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index b9ce2a6a87ed..29b5774dd32d 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -551,14 +551,9 @@ static int ena_refill_rx_bufs(struct ena_ring *rx_ring, u32 num)
 			    rx_ring->qid, i, num);
 	}
 
-	if (likely(i)) {
-		/* Add memory barrier to make sure the desc were written before
-		 * issue a doorbell
-		 */
-		wmb();
-		ena_com_write_sq_doorbell(rx_ring->ena_com_io_sq, true);
-		mmiowb();
-	}
+	/* ena_com_write_sq_doorbell issues a wmb() */
+	if (likely(i))
+		ena_com_write_sq_doorbell(rx_ring->ena_com_io_sq);
 
 	rx_ring->next_to_use = next_to_use;
 
@@ -2112,12 +2107,6 @@ static netdev_tx_t ena_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	tx_ring->next_to_use = ENA_TX_RING_IDX_NEXT(next_to_use,
 		tx_ring->ring_size);
 
-	/* This WMB is aimed to:
-	 * 1 - perform smp barrier before reading next_to_completion
-	 * 2 - make sure the desc were written before trigger DB
-	 */
-	wmb();
-
 	/* stop the queue when no more space available, the packet can have up
 	 * to sgl_size + 2. one for the meta descriptor and one for header
 	 * (if the header is larger than tx_max_header_size).
@@ -2136,10 +2125,11 @@ static netdev_tx_t ena_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		 * stop the queue but meanwhile clean_tx_irq updates
 		 * next_to_completion and terminates.
 		 * The queue will remain stopped forever.
-		 * To solve this issue this function perform rmb, check
-		 * the wakeup condition and wake up the queue if needed.
+		 * To solve this issue add a mb() to make sure that
+		 * netif_tx_stop_queue() write is vissible before checking if
+		 * there is additional space in the queue.
 		 */
-		smp_rmb();
+		smp_mb();
 
 		if (ena_com_sq_empty_space(tx_ring->ena_com_io_sq)
 				> ENA_TX_WAKEUP_THRESH) {
@@ -2151,8 +2141,10 @@ static netdev_tx_t ena_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	}
 
 	if (netif_xmit_stopped(txq) || !skb->xmit_more) {
-		/* trigger the dma engine */
-		ena_com_write_sq_doorbell(tx_ring->ena_com_io_sq, false);
+		/* trigger the dma engine. ena_com_write_sq_doorbell()
+		 * has a mb
+		 */
+		ena_com_write_sq_doorbell(tx_ring->ena_com_io_sq);
 		u64_stats_update_begin(&tx_ring->syncp);
 		tx_ring->tx_stats.doorbells++;
 		u64_stats_update_end(&tx_ring->syncp);
-- 
2.15.2.AMZN

^ permalink raw reply related

* [PATCH net V2 5/7] net: ena: fix missing lock during device destruction
From: netanel @ 2018-09-09  8:15 UTC (permalink / raw)
  To: davem, netdev
  Cc: Netanel Belgazal, dwmw, zorik, matua, saeedb, msw, aliguori,
	nafea, evgenys, gtzalik
In-Reply-To: <20180909081526.83979-1-netanel@amazon.com>

From: Netanel Belgazal <netanel@amazon.com>

acquire the rtnl_lock during device destruction to avoid
using partially destroyed device.

ena_remove() shares almost the same logic as ena_destroy_device(),
so use ena_destroy_device() and avoid duplications.

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 20 +++++++-------------
 1 file changed, 7 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index a68c2a8d4da2..b9ce2a6a87ed 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -3421,24 +3421,18 @@ static void ena_remove(struct pci_dev *pdev)
 
 	unregister_netdev(netdev);
 
-	/* Reset the device only if the device is running. */
+	/* If the device is running then we want to make sure the device will be
+	 * reset to make sure no more events will be issued by the device.
+	 */
 	if (test_bit(ENA_FLAG_DEVICE_RUNNING, &adapter->flags))
-		ena_com_dev_reset(ena_dev, adapter->reset_reason);
-
-	ena_free_mgmnt_irq(adapter);
+		set_bit(ENA_FLAG_TRIGGER_RESET, &adapter->flags);
 
-	ena_disable_msix(adapter);
+	rtnl_lock();
+	ena_destroy_device(adapter, true);
+	rtnl_unlock();
 
 	free_netdev(netdev);
 
-	ena_com_mmio_reg_read_request_destroy(ena_dev);
-
-	ena_com_abort_admin_commands(ena_dev);
-
-	ena_com_wait_for_abort_completion(ena_dev);
-
-	ena_com_admin_destroy(ena_dev);
-
 	ena_com_rss_destroy(ena_dev);
 
 	ena_com_delete_debug_area(ena_dev);
-- 
2.15.2.AMZN

^ permalink raw reply related

* [PATCH net V2 4/7] net: ena: fix potential double ena_destroy_device()
From: netanel @ 2018-09-09  8:15 UTC (permalink / raw)
  To: davem, netdev
  Cc: Netanel Belgazal, dwmw, zorik, matua, saeedb, msw, aliguori,
	nafea, evgenys, gtzalik
In-Reply-To: <20180909081526.83979-1-netanel@amazon.com>

From: Netanel Belgazal <netanel@amazon.com>

ena_destroy_device() can potentially be called twice.
To avoid this, check that the device is running and
only then proceed destroying it.

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 035d47d2179a..a68c2a8d4da2 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -2556,6 +2556,9 @@ static void ena_destroy_device(struct ena_adapter *adapter, bool graceful)
 	struct ena_com_dev *ena_dev = adapter->ena_dev;
 	bool dev_up;
 
+	if (!test_bit(ENA_FLAG_DEVICE_RUNNING, &adapter->flags))
+		return;
+
 	netif_carrier_off(netdev);
 
 	del_timer_sync(&adapter->timer_service);
@@ -2592,6 +2595,7 @@ static void ena_destroy_device(struct ena_adapter *adapter, bool graceful)
 	adapter->reset_reason = ENA_REGS_RESET_NORMAL;
 
 	clear_bit(ENA_FLAG_TRIGGER_RESET, &adapter->flags);
+	clear_bit(ENA_FLAG_DEVICE_RUNNING, &adapter->flags);
 }
 
 static int ena_restore_device(struct ena_adapter *adapter)
@@ -2636,6 +2640,7 @@ static int ena_restore_device(struct ena_adapter *adapter)
 		}
 	}
 
+	set_bit(ENA_FLAG_DEVICE_RUNNING, &adapter->flags);
 	mod_timer(&adapter->timer_service, round_jiffies(jiffies + HZ));
 	dev_err(&pdev->dev, "Device reset completed successfully\n");
 
-- 
2.15.2.AMZN

^ permalink raw reply related

* [PATCH net V2 2/7] net: ena: fix driver when PAGE_SIZE == 64kB
From: netanel @ 2018-09-09  8:15 UTC (permalink / raw)
  To: davem, netdev
  Cc: Netanel Belgazal, dwmw, zorik, matua, saeedb, msw, aliguori,
	nafea, evgenys, gtzalik
In-Reply-To: <20180909081526.83979-1-netanel@amazon.com>

From: Netanel Belgazal <netanel@amazon.com>

The buffer length field in the ena rx descriptor is 16 bit, and the
current driver passes a full page in each ena rx descriptor.
When PAGE_SIZE equals 64kB or more, the buffer length field becomes
zero.
To solve this issue, limit the ena Rx descriptor to use 16kB even
when allocating 64kB kernel pages. This change would not impact ena
device functionality, as 16kB is still larger than maximum MTU.

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 10 +++++-----
 drivers/net/ethernet/amazon/ena/ena_netdev.h | 11 +++++++++++
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 170830b807fe..69e684fd2787 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -461,7 +461,7 @@ static inline int ena_alloc_rx_page(struct ena_ring *rx_ring,
 		return -ENOMEM;
 	}
 
-	dma = dma_map_page(rx_ring->dev, page, 0, PAGE_SIZE,
+	dma = dma_map_page(rx_ring->dev, page, 0, ENA_PAGE_SIZE,
 			   DMA_FROM_DEVICE);
 	if (unlikely(dma_mapping_error(rx_ring->dev, dma))) {
 		u64_stats_update_begin(&rx_ring->syncp);
@@ -478,7 +478,7 @@ static inline int ena_alloc_rx_page(struct ena_ring *rx_ring,
 	rx_info->page_offset = 0;
 	ena_buf = &rx_info->ena_buf;
 	ena_buf->paddr = dma;
-	ena_buf->len = PAGE_SIZE;
+	ena_buf->len = ENA_PAGE_SIZE;
 
 	return 0;
 }
@@ -495,7 +495,7 @@ static void ena_free_rx_page(struct ena_ring *rx_ring,
 		return;
 	}
 
-	dma_unmap_page(rx_ring->dev, ena_buf->paddr, PAGE_SIZE,
+	dma_unmap_page(rx_ring->dev, ena_buf->paddr, ENA_PAGE_SIZE,
 		       DMA_FROM_DEVICE);
 
 	__free_page(page);
@@ -916,10 +916,10 @@ static struct sk_buff *ena_rx_skb(struct ena_ring *rx_ring,
 	do {
 		dma_unmap_page(rx_ring->dev,
 			       dma_unmap_addr(&rx_info->ena_buf, paddr),
-			       PAGE_SIZE, DMA_FROM_DEVICE);
+			       ENA_PAGE_SIZE, DMA_FROM_DEVICE);
 
 		skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, rx_info->page,
-				rx_info->page_offset, len, PAGE_SIZE);
+				rx_info->page_offset, len, ENA_PAGE_SIZE);
 
 		netif_dbg(rx_ring->adapter, rx_status, rx_ring->netdev,
 			  "rx skb updated. len %d. data_len %d\n",
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h
index f1972b5ab650..7c7ae56c52cf 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.h
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h
@@ -355,4 +355,15 @@ void ena_dump_stats_to_buf(struct ena_adapter *adapter, u8 *buf);
 
 int ena_get_sset_count(struct net_device *netdev, int sset);
 
+/* The ENA buffer length fields is 16 bit long. So when PAGE_SIZE == 64kB the
+ * driver passas 0.
+ * Since the max packet size the ENA handles is ~9kB limit the buffer length to
+ * 16kB.
+ */
+#if PAGE_SIZE > SZ_16K
+#define ENA_PAGE_SIZE SZ_16K
+#else
+#define ENA_PAGE_SIZE PAGE_SIZE
+#endif
+
 #endif /* !(ENA_H) */
-- 
2.15.2.AMZN

^ permalink raw reply related

* [PATCH net V2 1/7] net: ena: fix surprise unplug NULL dereference kernel crash
From: netanel @ 2018-09-09  8:15 UTC (permalink / raw)
  To: davem, netdev
  Cc: Netanel Belgazal, dwmw, zorik, matua, saeedb, msw, aliguori,
	nafea, evgenys, gtzalik
In-Reply-To: <20180909081526.83979-1-netanel@amazon.com>

From: Netanel Belgazal <netanel@amazon.com>

Starting with driver version 1.5.0, in case of a surprise device
unplug, there is a race caused by invoking ena_destroy_device()
from two different places. As a result, the readless register might
be accessed after it was destroyed.

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index c673ac2df65b..170830b807fe 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -3409,12 +3409,12 @@ static void ena_remove(struct pci_dev *pdev)
 		netdev->rx_cpu_rmap = NULL;
 	}
 #endif /* CONFIG_RFS_ACCEL */
-
-	unregister_netdev(netdev);
 	del_timer_sync(&adapter->timer_service);
 
 	cancel_work_sync(&adapter->reset_task);
 
+	unregister_netdev(netdev);
+
 	/* Reset the device only if the device is running. */
 	if (test_bit(ENA_FLAG_DEVICE_RUNNING, &adapter->flags))
 		ena_com_dev_reset(ena_dev, adapter->reset_reason);
-- 
2.15.2.AMZN

^ permalink raw reply related

* Re: [PATCH net 01/13] net: sched: cls_u32: fix hnode refcounting
From: Jamal Hadi Salim @ 2018-09-09 11:37 UTC (permalink / raw)
  To: Al Viro, netdev; +Cc: Cong Wang, Jiri Pirko, stable
In-Reply-To: <20180909013132.3222-2-viro@ZenIV.linux.org.uk>

On 2018-09-08 9:31 p.m., Al Viro wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> cls_u32.c misuses refcounts for struct tc_u_hnode - it counts references via
> ->hlist and via ->tp_root together.  u32_destroy() drops the former and, in
> case when there had been links, leaves the sucker on the list.  As the result,
> there's nothing to protect it from getting freed once links are dropped.
> That also makes the "is it busy" check incapable of catching the root hnode -
> it *is* busy (there's a reference from tp), but we don't see it as something
> separate.  "Is it our root?" check partially covers that, but the problem
> exists for others' roots as well.
> 
> AFAICS, the minimal fix preserving the existing behaviour (where it doesn't
> include oopsen, that is) would be this:
>          * count tp->root and tp_c->hlist as separate references.  I.e.
> have u32_init() set refcount to 2, not 1.
> 	* in u32_destroy() we always drop the former; in u32_destroy_hnode() -
> the latter.
> 
> 	That way we have *all* references contributing to refcount.  List
> removal happens in u32_destroy_hnode() (called only when ->refcnt is 1)
> an in u32_destroy() in case of tc_u_common going away, along with everything
> reachable from it.  IOW, that way we know that u32_destroy_key() won't
> free something still on the list (or pointed to by someone's ->root).
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

per Cong's earlier remark - the reproducer going in the changelog
would be nice to show i.e this you posted earlier, otherwise:

Tested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

Reminder:

^ permalink raw reply

* Re: Overlayfs @ Containers and checkpoint/restart micro-conference at LPC2018
From: Amir Goldstein @ 2018-09-09  6:31 UTC (permalink / raw)
  To: christian
  Cc: Stephane Graber, containers, Miklos Szeredi, Netdev, overlayfs,
	lxc-users, LSM List, lxc-devel, linux-fsdevel, zhangyi (F)
In-Reply-To: <20180909013108.6wjlx75ztqjipepg@mailbox.org>

On Sun, Sep 9, 2018 at 4:31 AM Christian Brauner <christian@brauner.io> wrote:
>
...
> > [cc: overlayfs developers]
> >
> > Hi Stéphane!
>
> Hey Amir,
>
> I'm one of the co-organizers of the microconf.
>
> >
> > I am not planing to travel to LPC this year, so this is more of an FYI than
> > a CFP, but maybe another overlayfs developer can pick up this glove??
>
> Sure, that would be great.
>
> >
> > For the past two years I have participated in the effort to fix overlayfs
> > "non-standard" behavior:
> > https://github.com/amir73il/overlayfs/wiki/Overlayfs-non-standard-behavior
>
> Yes, this is an issue that we were aware of for a long time and it
> something that has made overlayfs somewhat more difficult to use than it
> should be.
>
> >
> > Allegedly, this effort went underway to improve the experience of overlayfs
> > users, who are mostly applications running inside containers. For backward
> > compatibility reasons, container runtimes will need to opt-in for fixing some
> > of the legacy behavior.
> >
> > In reality, I have seen very little cross list interaction between linux-unionfs
> > and containers mailing lists. The only interaction I recall in the
> > past two years
> > ended up in a fix in overlayfs to require opt-in for fixing yet another backward
> > compatible bad behavior, although docker did follow up shortly after to fix
> > bad practice in container runtime:
> > https://github.com/moby/moby/issues/34672
> >
> > So the questions I would like to relay to the micro-conf participants w.r.t the
> > new opt-in overlayfs behavior:
> > 1. Did you know?
>
> I personally did not know about the new opt-in behavior. More reason to
> give a talk! :)
>
> > 2. Do you care?
>
> Yes, we do care. However - speaking as LXC upstream now - we have
> recently focused on getting shiftfs to work rather than overlayfs.
>

IMO, as I expressed it in the past, the fact that shiftfs development is not
collaborated with overlayfs developers is a pitty.
Yes shiftfs has a different purpose than overlayfs, but they have common
use cases and common problems as well.

> We are more than happy to have a overlayfs talk at the microconf. If
> someone were to talk about:
> - What non-standard behavior has already been fixed?
> - How has it been fixed?

IMO, those questions are covered quite well by the wiki and overlayfs.txt
documentation in kernel tree.

> - What non-standard behavior still needs to be fixed?

There's the mmap MAP_SHARED case covered in the wiki
and there may be other small stuff, but not sure if anyone cares
about them, so the question should really be directed back to the audience...

> - Outstanding problems that either still need a solution or
>   are solved but one would like feedback on the implementation. This way
>   we can have a good discussion.
>

I think one of the chsallange that distros and container runtime will need to
deal with is managing format versions of overlay "images".
The reason the new features require user or distro to opt-in is because
the new features create overlayfs images that are not fully compatible with old
kernels and existing container image tools (i.e. export/migrate image).

The new overlayfs-progs project by Zhangyi is going to help in that respect:
https://github.com/hisilicon/overlayfs-progs
As well as Zhangyi's work on overlayfs feature set support:
https://marc.info/?l=linux-unionfs&m=153302911328159&w=2

Thanks,
Amir.

^ permalink raw reply

* Re: [PATCH net] ipv6: use rt6_info members when dst is set in rt6_fill_node
From: Xin Long @ 2018-09-09  6:29 UTC (permalink / raw)
  To: David Ahern; +Cc: network dev, davem, Roopa Prabhu
In-Reply-To: <4abf5ec6-862b-f88e-b0ce-ddcd4308fd59@cumulusnetworks.com>

On Sun, Sep 9, 2018 at 9:45 AM David Ahern <dsa@cumulusnetworks.com> wrote:
>
> On 9/8/18 3:24 AM, Xin Long wrote:
> > In inet6_rtm_getroute, since Commit 93531c674315 ("net/ipv6: separate
> > handling of FIB entries from dst based routes"), it has used rt->from
> > to dump route info instead of rt.
> >
> > However for some route like cache, its information is not the same as
> > that of the 'from' one. It caused 'ip -6 route get' to dump the wrong
> > route information.
> >
> > In Jianlin's testing, the output information even lost the expiration
> > time for a pmtu route cache due to the wrong fib6_flags.
>
> you are right about the flags ...
>
> >
> > So change to use rt6_info members when it tries to dump a route entry
> > without fibmatch set.
>
> but not the src, dst and prefsrc.
>
> >
> > Note that we will fix the gw/nh dump in another patch.
>
> And only the gateway can change do to a redirect and redirects do not
> change the device - only the gateway.
>
> Let's do both changes in a single patch.
>
> >
> > Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes")
> > Reported-by: Jianlin Shi <jishi@redhat.com>
> > Signed-off-by: Xin Long <lucien.xin@gmail.com>
> > ---
> >  net/ipv6/route.c | 39 ++++++++++++++++++++++++++-------------
> >  1 file changed, 26 insertions(+), 13 deletions(-)
> >
> > diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> > index 18e00ce..e554922 100644
> > --- a/net/ipv6/route.c
> > +++ b/net/ipv6/route.c
> > @@ -4670,20 +4670,33 @@ static int rt6_fill_node(struct net *net, struct sk_buff *skb,
> >                        int iif, int type, u32 portid, u32 seq,
> >                        unsigned int flags)
> >  {
> > -     struct rtmsg *rtm;
> > +     struct rt6key *fib6_prefsrc, *fib6_dst, *fib6_src;
> > +     struct rt6_info *rt6 = (struct rt6_info *)dst;
> > +     u32 *pmetrics, table, fib6_flags;
> >       struct nlmsghdr *nlh;
> > +     struct rtmsg *rtm;
> >       long expires = 0;
> > -     u32 *pmetrics;
> > -     u32 table;
> >
> >       nlh = nlmsg_put(skb, portid, seq, type, sizeof(*rtm), flags);
> >       if (!nlh)
> >               return -EMSGSIZE;
> >
> > +     if (rt6) {
> > +             fib6_dst = &rt6->rt6i_dst;
> > +             fib6_src = &rt6->rt6i_src;
> > +             fib6_flags = rt6->rt6i_flags;
> > +             fib6_prefsrc = &rt6->rt6i_prefsrc;
> > +     } else {
> > +             fib6_dst = &rt->fib6_dst;
> > +             fib6_src = &rt->fib6_src;
> > +             fib6_flags = rt->fib6_flags;
> > +             fib6_prefsrc = &rt->fib6_prefsrc;
> > +     }
>
> Unless I am missing something at the moment, an rt6_info can only have
> the same dst, src and prefsrc as the fib6_info on which it is based.
> Thus, only the flags is needed above. That simplifies this patch a lot.
If dst, src and prefsrc in rt6_info are always the same as these in fib6_info,
why do we need them in rt6_info? we could just get it by 'from'.

^ permalink raw reply

* Re: [PATCH RFC 3/5] coccinelle: add xxxsetbitsXX converting spatch
From: SF Markus Elfring @ 2018-09-09 11:13 UTC (permalink / raw)
  To: Corentin Labbe, kernel-janitors
  Cc: Coccinelle, linux-amlogic, linux-arm-kernel, linux-ide,
	linux-kernel, linux-sunxi, linuxppc-dev, netdev, Alexandre Torgue,
	Alistair Popple, Anatolij Gustschin, Benjamin Herrenschmidt,
	Carlo Caione, Chen-Yu Tsai, David S. Miller, Gilles Muller,
	Giuseppe Cavallaro, Jose Abreu, Julia Lawall, Kevin Hilman,
	Kumar Gala
In-Reply-To: <1536349307-20714-4-git-send-email-clabbe@baylibre.com>

How do you think about to add any more meta-data for this semantic patch script?

* SPDX identifier
* Copyright information
* Confidence level

https://bottest.wiki.kernel.org/coccicheck

> +virtual context

Further variables will be needed if you would like to use corresponding
operation modes (besides “patch”).

> +expression reg;
> +expression set;
> +expression clear;

I propose once more to avoid the repetition of (unnecessary) SmPL code.
This part could be written like the following instead.

+expression clear, set, reg;

If you would increase the usage of SmPL disjunctions, the specification
of duplicate SmPL code could be reduced considerably.

* Would you like to merge SmPL rules based on the distinction between
  the data types “u32” and “u64”?

* Did you identify any optional code in this transformation approach?

> +@@
> +expression base;
> +expression offset;
> +expression value;
> +@@
> +
> +- mtu3_setbits(base, offset, value);
> ++ setbits32(base + offset, value);
> +
> +@@
> +expression base;
> +expression offset;
> +expression mask;
> +@@
> +
> +- mtu3_clrbits(base, offset, mask);
> ++ clrbits32(base + offset, mask);

Another update suggestion:

+@replacement@
+expression base, offset;
+@@
+(
+-mtu3_clrbits
++clrbits32
+|
+-mtu3_setbits
++setbits32
+)(base
+-     ,
++ +
+  offset, ...);

Would you like to try further software fine-tuning out?

Regards,
Markus

^ permalink raw reply

* Re: [PATCH net-next v8 3/7] net: vhost: factor out busy polling logic to vhost_net_busy_poll()
From: Tonghao Zhang @ 2018-09-09 10:58 UTC (permalink / raw)
  To: jasowang; +Cc: Linux Kernel Network Developers, virtualization, mst
In-Reply-To: <179340e9-f1cd-8378-1a01-61307546003c@redhat.com>

On Tue, Aug 21, 2018 at 11:15 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
>
> On 2018年08月19日 20:11, xiangxia.m.yue@gmail.com wrote:
> > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >
> > Factor out generic busy polling logic and will be
> > used for in tx path in the next patch. And with the patch,
> > qemu can set differently the busyloop_timeout for rx queue.
> >
> > To avoid duplicate codes, introduce the helper functions:
> > * sock_has_rx_data(changed from sk_has_rx_data)
> > * vhost_net_busy_poll_try_queue
> >
> > Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > ---
> >   drivers/vhost/net.c | 111 +++++++++++++++++++++++++++++++++-------------------
> >   1 file changed, 71 insertions(+), 40 deletions(-)
> >
> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > index 32c1b52..453c061 100644
> > --- a/drivers/vhost/net.c
> > +++ b/drivers/vhost/net.c
> > @@ -440,6 +440,75 @@ static void vhost_net_signal_used(struct vhost_net_virtqueue *nvq)
> >       nvq->done_idx = 0;
> >   }
> >
> > +static int sock_has_rx_data(struct socket *sock)
> > +{
> > +     if (unlikely(!sock))
> > +             return 0;
> > +
> > +     if (sock->ops->peek_len)
> > +             return sock->ops->peek_len(sock);
> > +
> > +     return skb_queue_empty(&sock->sk->sk_receive_queue);
> > +}
> > +
> > +static void vhost_net_busy_poll_try_queue(struct vhost_net *net,
> > +                                       struct vhost_virtqueue *vq)
> > +{
> > +     if (!vhost_vq_avail_empty(&net->dev, vq)) {
> > +             vhost_poll_queue(&vq->poll);
> > +     } else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
> > +             vhost_disable_notify(&net->dev, vq);
> > +             vhost_poll_queue(&vq->poll);
> > +     }
> > +}
> > +
> > +static void vhost_net_busy_poll(struct vhost_net *net,
> > +                             struct vhost_virtqueue *rvq,
> > +                             struct vhost_virtqueue *tvq,
> > +                             bool *busyloop_intr,
> > +                             bool poll_rx)
> > +{
> > +     unsigned long busyloop_timeout;
> > +     unsigned long endtime;
> > +     struct socket *sock;
> > +     struct vhost_virtqueue *vq = poll_rx ? tvq : rvq;
> > +
> > +     mutex_lock_nested(&vq->mutex, poll_rx ? VHOST_NET_VQ_TX: VHOST_NET_VQ_RX);
> > +     vhost_disable_notify(&net->dev, vq);
> > +     sock = rvq->private_data;
> > +
> > +     busyloop_timeout = poll_rx ? rvq->busyloop_timeout:
> > +                                  tvq->busyloop_timeout;
> > +
> > +     preempt_disable();
> > +     endtime = busy_clock() + busyloop_timeout;
> > +
> > +     while (vhost_can_busy_poll(endtime)) {
> > +             if (vhost_has_work(&net->dev)) {
> > +                     *busyloop_intr = true;
> > +                     break;
> > +             }
> > +
> > +             if ((sock_has_rx_data(sock) &&
> > +                  !vhost_vq_avail_empty(&net->dev, rvq)) ||
> > +                 !vhost_vq_avail_empty(&net->dev, tvq))
> > +                     break;
> > +
> > +             cpu_relax();
> > +     }
> > +
> > +     preempt_enable();
> > +
> > +     if (poll_rx)
> > +             vhost_net_busy_poll_try_queue(net, tvq);
> > +     else if (sock_has_rx_data(sock))
> > +             vhost_net_busy_poll_try_queue(net, rvq);
>
> This could be simplified like:
>
> if (poll_rx || sock_has_rx_data(sock))
>      vhost_net_busy_poll_try_queue(net, vq);
done, thanks

> Thanks
>
> > +     else /* On tx here, sock has no rx data. */
> > +             vhost_enable_notify(&net->dev, rvq);
> > +
> > +     mutex_unlock(&vq->mutex);
> > +}
> > +
> >   static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
> >                                   struct vhost_net_virtqueue *nvq,
> >                                   unsigned int *out_num, unsigned int *in_num,
> > @@ -753,16 +822,6 @@ static int peek_head_len(struct vhost_net_virtqueue *rvq, struct sock *sk)
> >       return len;
> >   }
> >
> > -static int sk_has_rx_data(struct sock *sk)
> > -{
> > -     struct socket *sock = sk->sk_socket;
> > -
> > -     if (sock->ops->peek_len)
> > -             return sock->ops->peek_len(sock);
> > -
> > -     return skb_queue_empty(&sk->sk_receive_queue);
> > -}
> > -
> >   static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk,
> >                                     bool *busyloop_intr)
> >   {
> > @@ -770,41 +829,13 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk,
> >       struct vhost_net_virtqueue *tnvq = &net->vqs[VHOST_NET_VQ_TX];
> >       struct vhost_virtqueue *rvq = &rnvq->vq;
> >       struct vhost_virtqueue *tvq = &tnvq->vq;
> > -     unsigned long uninitialized_var(endtime);
> >       int len = peek_head_len(rnvq, sk);
> >
> > -     if (!len && tvq->busyloop_timeout) {
> > +     if (!len && rvq->busyloop_timeout) {
> >               /* Flush batched heads first */
> >               vhost_net_signal_used(rnvq);
> >               /* Both tx vq and rx socket were polled here */
> > -             mutex_lock_nested(&tvq->mutex, VHOST_NET_VQ_TX);
> > -             vhost_disable_notify(&net->dev, tvq);
> > -
> > -             preempt_disable();
> > -             endtime = busy_clock() + tvq->busyloop_timeout;
> > -
> > -             while (vhost_can_busy_poll(endtime)) {
> > -                     if (vhost_has_work(&net->dev)) {
> > -                             *busyloop_intr = true;
> > -                             break;
> > -                     }
> > -                     if ((sk_has_rx_data(sk) &&
> > -                          !vhost_vq_avail_empty(&net->dev, rvq)) ||
> > -                         !vhost_vq_avail_empty(&net->dev, tvq))
> > -                             break;
> > -                     cpu_relax();
> > -             }
> > -
> > -             preempt_enable();
> > -
> > -             if (!vhost_vq_avail_empty(&net->dev, tvq)) {
> > -                     vhost_poll_queue(&tvq->poll);
> > -             } else if (unlikely(vhost_enable_notify(&net->dev, tvq))) {
> > -                     vhost_disable_notify(&net->dev, tvq);
> > -                     vhost_poll_queue(&tvq->poll);
> > -             }
> > -
> > -             mutex_unlock(&tvq->mutex);
> > +             vhost_net_busy_poll(net, rvq, tvq, busyloop_intr, true);
> >
> >               len = peek_head_len(rnvq, sk);
> >       }
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH net-next v8 5/7] net: vhost: introduce bitmap for vhost_poll
From: Tonghao Zhang @ 2018-09-09 10:58 UTC (permalink / raw)
  To: jasowang; +Cc: Linux Kernel Network Developers, virtualization, mst
In-Reply-To: <24f10821-a07e-e0cd-0266-b33856e7d88a@redhat.com>

On Tue, Aug 21, 2018 at 8:45 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
>
> On 2018年08月19日 20:11, xiangxia.m.yue@gmail.com wrote:
> > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >
> > The bitmap of vhost_dev can help us to check if the
> > specified poll is scheduled. This patch will be used
> > for next two patches.
> >
> > Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > ---
> >   drivers/vhost/net.c   | 11 +++++++++--
> >   drivers/vhost/vhost.c | 17 +++++++++++++++--
> >   drivers/vhost/vhost.h |  7 ++++++-
> >   3 files changed, 30 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > index 1eff72d..23d7ffc 100644
> > --- a/drivers/vhost/net.c
> > +++ b/drivers/vhost/net.c
> > @@ -1135,8 +1135,15 @@ static int vhost_net_open(struct inode *inode, struct file *f)
> >       }
> >       vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX);
> >
> > -     vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, EPOLLOUT, dev);
> > -     vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, EPOLLIN, dev);
> > +     vhost_poll_init(n->poll + VHOST_NET_VQ_TX,
> > +                     handle_tx_net,
> > +                     VHOST_NET_VQ_TX,
> > +                     EPOLLOUT, dev);
> > +
> > +     vhost_poll_init(n->poll + VHOST_NET_VQ_RX,
> > +                     handle_rx_net,
> > +                     VHOST_NET_VQ_RX,
> > +                     EPOLLIN, dev);
> >
> >       f->private_data = n;
> >
> > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > index a1c06e7..dc88a60 100644
> > --- a/drivers/vhost/vhost.c
> > +++ b/drivers/vhost/vhost.c
> > @@ -186,7 +186,7 @@ void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn)
> >
> >   /* Init poll structure */
> >   void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
> > -                  __poll_t mask, struct vhost_dev *dev)
> > +                  __u8 poll_id, __poll_t mask, struct vhost_dev *dev)
> >   {
> >       init_waitqueue_func_entry(&poll->wait, vhost_poll_wakeup);
> >       init_poll_funcptr(&poll->table, vhost_poll_func);
> > @@ -194,6 +194,7 @@ void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
> >       poll->dev = dev;
> >       poll->wqh = NULL;
> >
> > +     poll->poll_id = poll_id;
> >       vhost_work_init(&poll->work, fn);
> >   }
> >   EXPORT_SYMBOL_GPL(vhost_poll_init);
> > @@ -276,8 +277,16 @@ bool vhost_has_work(struct vhost_dev *dev)
> >   }
> >   EXPORT_SYMBOL_GPL(vhost_has_work);
> >
> > +bool vhost_has_work_pending(struct vhost_dev *dev, int poll_id)
> > +{
> > +     return !llist_empty(&dev->work_list) &&
> > +             test_bit(poll_id, dev->work_pending);
>
> I think we've already had something similar. E.g can we test
> VHOST_WORK_QUEUED instead?
done, thanks. I am busy two weeks ago, and now I focus on these patches.
> Thanks
>
> > +}
> > +EXPORT_SYMBOL_GPL(vhost_has_work_pending);
> > +
> >   void vhost_poll_queue(struct vhost_poll *poll)
> >   {
> > +     set_bit(poll->poll_id, poll->dev->work_pending);
> >       vhost_work_queue(poll->dev, &poll->work);
> >   }
> >   EXPORT_SYMBOL_GPL(vhost_poll_queue);
> > @@ -354,6 +363,7 @@ static int vhost_worker(void *data)
> >               if (!node)
> >                       schedule();
> >
> > +             bitmap_zero(dev->work_pending, VHOST_DEV_MAX_VQ);
> >               node = llist_reverse_order(node);
> >               /* make sure flag is seen after deletion */
> >               smp_wmb();
> > @@ -420,6 +430,8 @@ void vhost_dev_init(struct vhost_dev *dev,
> >       struct vhost_virtqueue *vq;
> >       int i;
> >
> > +     BUG_ON(nvqs > VHOST_DEV_MAX_VQ);
> > +
> >       dev->vqs = vqs;
> >       dev->nvqs = nvqs;
> >       mutex_init(&dev->mutex);
> > @@ -428,6 +440,7 @@ void vhost_dev_init(struct vhost_dev *dev,
> >       dev->iotlb = NULL;
> >       dev->mm = NULL;
> >       dev->worker = NULL;
> > +     bitmap_zero(dev->work_pending, VHOST_DEV_MAX_VQ);
> >       init_llist_head(&dev->work_list);
> >       init_waitqueue_head(&dev->wait);
> >       INIT_LIST_HEAD(&dev->read_list);
> > @@ -445,7 +458,7 @@ void vhost_dev_init(struct vhost_dev *dev,
> >               vhost_vq_reset(dev, vq);
> >               if (vq->handle_kick)
> >                       vhost_poll_init(&vq->poll, vq->handle_kick,
> > -                                     EPOLLIN, dev);
> > +                                     i, EPOLLIN, dev);
> >       }
> >   }
> >   EXPORT_SYMBOL_GPL(vhost_dev_init);
> > diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> > index 6c844b9..60b6f6d 100644
> > --- a/drivers/vhost/vhost.h
> > +++ b/drivers/vhost/vhost.h
> > @@ -30,6 +30,7 @@ struct vhost_poll {
> >       wait_queue_head_t        *wqh;
> >       wait_queue_entry_t              wait;
> >       struct vhost_work         work;
> > +     __u8                      poll_id;
> >       __poll_t                  mask;
> >       struct vhost_dev         *dev;
> >   };
> > @@ -37,9 +38,10 @@ struct vhost_poll {
> >   void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn);
> >   void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work);
> >   bool vhost_has_work(struct vhost_dev *dev);
> > +bool vhost_has_work_pending(struct vhost_dev *dev, int poll_id);
> >
> >   void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
> > -                  __poll_t mask, struct vhost_dev *dev);
> > +                  __u8 id, __poll_t mask, struct vhost_dev *dev);
> >   int vhost_poll_start(struct vhost_poll *poll, struct file *file);
> >   void vhost_poll_stop(struct vhost_poll *poll);
> >   void vhost_poll_flush(struct vhost_poll *poll);
> > @@ -152,6 +154,8 @@ struct vhost_msg_node {
> >     struct list_head node;
> >   };
> >
> > +#define VHOST_DEV_MAX_VQ     128
> > +
> >   struct vhost_dev {
> >       struct mm_struct *mm;
> >       struct mutex mutex;
> > @@ -159,6 +163,7 @@ struct vhost_dev {
> >       int nvqs;
> >       struct eventfd_ctx *log_ctx;
> >       struct llist_head work_list;
> > +     DECLARE_BITMAP(work_pending, VHOST_DEV_MAX_VQ);
> >       struct task_struct *worker;
> >       struct vhost_umem *umem;
> >       struct vhost_umem *iotlb;
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH net] ipv6: use rt6_info members when dst is set in rt6_fill_node
From: David Ahern @ 2018-09-09  1:44 UTC (permalink / raw)
  To: Xin Long, network dev; +Cc: davem, Roopa Prabhu
In-Reply-To: <840f0c017c670ae9cc8fd8330ca24fcca1207918.1536398641.git.lucien.xin@gmail.com>

On 9/8/18 3:24 AM, Xin Long wrote:
> In inet6_rtm_getroute, since Commit 93531c674315 ("net/ipv6: separate
> handling of FIB entries from dst based routes"), it has used rt->from
> to dump route info instead of rt.
> 
> However for some route like cache, its information is not the same as
> that of the 'from' one. It caused 'ip -6 route get' to dump the wrong
> route information.
> 
> In Jianlin's testing, the output information even lost the expiration
> time for a pmtu route cache due to the wrong fib6_flags.

you are right about the flags ...

> 
> So change to use rt6_info members when it tries to dump a route entry
> without fibmatch set.

but not the src, dst and prefsrc.

> 
> Note that we will fix the gw/nh dump in another patch.

And only the gateway can change do to a redirect and redirects do not
change the device - only the gateway.

Let's do both changes in a single patch.

> 
> Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes")
> Reported-by: Jianlin Shi <jishi@redhat.com>
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> ---
>  net/ipv6/route.c | 39 ++++++++++++++++++++++++++-------------
>  1 file changed, 26 insertions(+), 13 deletions(-)
> 
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 18e00ce..e554922 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -4670,20 +4670,33 @@ static int rt6_fill_node(struct net *net, struct sk_buff *skb,
>  			 int iif, int type, u32 portid, u32 seq,
>  			 unsigned int flags)
>  {
> -	struct rtmsg *rtm;
> +	struct rt6key *fib6_prefsrc, *fib6_dst, *fib6_src;
> +	struct rt6_info *rt6 = (struct rt6_info *)dst;
> +	u32 *pmetrics, table, fib6_flags;
>  	struct nlmsghdr *nlh;
> +	struct rtmsg *rtm;
>  	long expires = 0;
> -	u32 *pmetrics;
> -	u32 table;
>  
>  	nlh = nlmsg_put(skb, portid, seq, type, sizeof(*rtm), flags);
>  	if (!nlh)
>  		return -EMSGSIZE;
>  
> +	if (rt6) {
> +		fib6_dst = &rt6->rt6i_dst;
> +		fib6_src = &rt6->rt6i_src;
> +		fib6_flags = rt6->rt6i_flags;
> +		fib6_prefsrc = &rt6->rt6i_prefsrc;
> +	} else {
> +		fib6_dst = &rt->fib6_dst;
> +		fib6_src = &rt->fib6_src;
> +		fib6_flags = rt->fib6_flags;
> +		fib6_prefsrc = &rt->fib6_prefsrc;
> +	}

Unless I am missing something at the moment, an rt6_info can only have
the same dst, src and prefsrc as the fib6_info on which it is based.
Thus, only the flags is needed above. That simplifies this patch a lot.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox