Netdev List
 help / color / mirror / Atom feed
* Re: Route fallback issue
From: Grant Taylor @ 2018-06-20 15:18 UTC (permalink / raw)
  To: netdev; +Cc: lartc
In-Reply-To: <7e001d45-f692-7d93-0390-f0e6077bf724@cumulusnetworks.com>

[-- Attachment #1: Type: text/plain, Size: 441 bytes --]

On 06/20/2018 07:48 AM, David Ahern wrote:
> See the ignore_routes_with_linkdown and fib_multipath_use_neigh sysctl 
> settings.

Where can I find more information on ignore_routes_with_linkdown?  I 
don't see it listed in $Kernel/Documentation/networking/ip-sysctl.txt. 
(I do see fib_multipath_use_neigh documented there in.)

> A feature is in the works to have fallback nexthops.

O.o?



-- 
Grant. . . .
unix || die


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3982 bytes --]

^ permalink raw reply

* Re: Incomplete fix for be9c798 / 7b2ee50 hv_netvsc: common detach logic?
From: Thomas Walker @ 2018-06-20 15:26 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Greg KH, devel, Stephen Hemminger, netdev
In-Reply-To: <20180619161440.2bc69506@xeon-e3>

On Tue, Jun 19, 2018 at 04:14:40PM -0700, Stephen Hemminger wrote:
> On Wed, 20 Jun 2018 08:01:50 +0900
> Greg KH <greg@kroah.com> wrote:
> 
> > On Tue, Jun 19, 2018 at 03:19:41PM -0400, Thomas Walker wrote:
> > > Upon updating some internal kernels from 4.14.18 to 4.14.49, our hyper-v guests failed to bring up network interfaces on boot, logging "A link change request failed with some changes committed already. Interface eth0 may have been left with an inconsistent configuration, please check."  Running 'ifconfig eth0 up' appears to fix the problem temporarily so I went about bisecting which landed on:
> > > 
> > > commit be9c798d0d13ae609a91177323ac816545c39d28
> > > Author: Stephen Hemminger <stephen@networkplumber.org>
> > > Date:   Mon May 14 15:32:18 2018 -0700
> > > 
> > >     hv_netvsc: common detach logic
> > >     
> > >     [ Commit 7b2ee50c0cd513a176a26a71f2989facdd75bfea upstream. ]
> > >     
> > >     Make common function for detaching internals of device
> > >     during changes to MTU and RSS. Make sure no more packets
> > >     are transmitted and all packets have been received before
> > >     doing device teardown.
> > > 
> > >     Change the wait logic to be common and use usleep_range().
> > > 
> > >     Changes transmit enabling logic so that transmit queues are disabled
> > >     during the period when lower device is being changed. And enabled
> > >     only after sub channels are setup. This avoids issue where it could
> > >     be that a packet was being sent while subchannel was not initialized.
> > > 
> > >     Fixes: 8195b1396ec8 ("hv_netvsc: fix deadlock on hotplug")
> > >     Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
> > >     Signed-off-by: David S. Miller <davem@davemloft.net>
> > >     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > 
> > > 
> > > The removal of which does indeed fix the problem.  That led me to:
> > > 
> > > commit 52acf73b6e9a6962045feb2ba5a8921da2201915
> > > Author: Dexuan Cui <decui@microsoft.com>
> > > Date:   Wed Jun 6 21:32:51 2018 +0000
> > > 
> > >     hv_netvsc: Fix a network regression after ifdown/ifup
> > >     
> > >     Recently people reported the NIC stops working after
> > >     "ifdown eth0; ifup eth0". It turns out in this case the TX queues are not
> > >     enabled, after the refactoring of the common detach logic: when the NIC
> > >     has sub-channels, usually we enable all the TX queues after all
> > >     sub-channels are set up: see rndis_set_subchannel() ->
> > >     netif_device_attach(), but in the case of "ifdown eth0; ifup eth0" where
> > >     the number of channels doesn't change, we also must make sure the TX queues
> > >     are enabled. The patch fixes the regression.
> > >     
> > >     Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic")
> > >     Signed-off-by: Dexuan Cui <decui@microsoft.com>
> > >     Cc: Stephen Hemminger <sthemmin@microsoft.com>
> > >     Cc: K. Y. Srinivasan <kys@microsoft.com>
> > >     Cc: Haiyang Zhang <haiyangz@microsoft.com>
> > >     Signed-off-by: David S. Miller <davem@davemloft.net>
> > > 
> > > Which sounded very promising, but does not seem to fully fix the issue.  Doing some more digging, I was able to determine that the message coincides with 'ip link set dev eth0 mtu 1300 up' very shortly (>~1 second) after the hv_netvsc driver loads.  If I delay the mtu change until well after the driver loads, everything works fine.  If I unload hv_netvsc and then reload it and apply the mtu change immediately, the failure re-occurs.  So something is racy here, and the above doesn't entirely address it.
> > > 
> > > I'm happy to test out any suggested patches and/or do additional debugging if anyone has any suggestions.
> > > 
> > > (oh, and I did also try 4.18-rc1 and the problem still persists)  
> > 
> > Always cc: the authors of the patch you are having problems with, along
> > with the mailing list for the networking subsystem, so that the right
> > people know to look at this.
> 
> How are you changing the MTU? and starting the device.
> When MTU changes the device has to reconnect with the host. This takes a small amount of time
> and no changes to device state are possible then.
> 
> If MTU change happens and at the same time some other script tries to bring up the connection,
> then that script will get an error.

Simply setting 'mtu 1300' in /etc/network/interfaces (Debian userland) which results in ifup executing:

ip addr add 192.168.1.100/255.255.255.0 broadcast 192.168.1.255 dev eth0 label eth0
ip link set dev eth0 mtu 1300 up                                               
ip route add default via 192.168.1.1 dev eth0                               

Running those commands standalone (outside of ifup) reproduces the problem as well.  Trying lots of different permutations of loading the driver and different commands with different delays, my initial suspicion that this was somehow related to proximity to the driver load appears incorrect.  The only thing that seems to matter is setting the mtu and link state in one go.  If I separate the 'ip link' command into two, 'ip link set dev eth0 up' and 'ip link set dev eth0 mtu 1300', in either order (or if either link is already up or mtu is already 1300), then there is no error.  

(I should note that the above testing is with 4.14.49 with 52acf73)

^ permalink raw reply

* Re: Route fallback issue
From: Grant Taylor @ 2018-06-20 15:38 UTC (permalink / raw)
  To: netdev; +Cc: lartc
In-Reply-To: <e19de6ae-536c-7dc5-22e7-62ee7d077c67@spamtrap.tnetconsulting.net>

[-- Attachment #1: Type: text/plain, Size: 1193 bytes --]

On 06/20/2018 09:18 AM, Grant Taylor wrote:
> Where can I find more information on ignore_routes_with_linkdown?  I 
> don't see it listed in $Kernel/Documentation/networking/ip-sysctl.txt. 
> (I do see fib_multipath_use_neigh documented there in.)

I'm specifically interested in if ignore_routes_with_linkdown and / or 
fib_multipath_use_neigh will cause Linux to fall back to an alternate 
(higher metric) route if the link is still up but the neighbor is not 
accessible across it.

           +-------+
           | Linux |
           +---+---+
               |
+-----+   +---+----+   +-----+
| R 1 +---+ Switch +---+ R 2 |
+-----+   +--------+   +-----+

A typical scenario is where Linux is connected to a DSL or Cable modem 
where the physical link stays up even if the neighbor R {1,2} goes 
offline.  It's not possible to rely on the local link (MII) status to 
determine that a neighbor is not reachable.  I.e. R 1 going away like below.

           +-------+
           | Linux |
           +---+---+
               |
+-----+   +---+----+   +-----+
| R 1 X   X Switch +---+ R 2 |
+-----+   +--------+   +-----+



-- 
Grant. . . .
unix || die


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3982 bytes --]

^ permalink raw reply

* Re: [Y2038] [PATCH] ceph: use ktime_get_real_seconds()
From: Arnd Bergmann @ 2018-06-20 15:47 UTC (permalink / raw)
  To: Allen Pais
  Cc: Networking, y2038 Mailman List, ceph-devel, David Miller,
	Linux Kernel Mailing List
In-Reply-To: <1529488801-22093-1-git-send-email-allen.lkml@gmail.com>

On Wed, Jun 20, 2018 at 12:00 PM, Allen Pais <allen.lkml@gmail.com> wrote:
> Use ktime_get_real_seconds() as get_seconds() is deprecated.
>
> Signed-off-by: Allen Pais <allen.lkml@gmail.com>

I have done a similar patch and will post it soon along with the rest of the
ceph y2038 series. Please have a look at "ceph: use timespec64 in for
keepalive" and comment if you see something that I missed.

       Arnd

^ permalink raw reply

* [PATCH net] sctp: fix erroneous inc of snmp SctpFragUsrMsgs
From: Marcelo Ricardo Leitner @ 2018-06-20 15:47 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, Neil Horman, Vlad Yasevich

Currently it is incrementing SctpFragUsrMsgs when the user message size
is of the exactly same size as the maximum fragment size, which is wrong.

The fix is to increment it only when user message is bigger than the
maximum fragment size.

Fixes: bfd2e4b8734d ("sctp: refactor sctp_datamsg_from_user")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
 net/sctp/chunk.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
index 79daa98208c391c780440144d69bc7be875c3476..bfb9f812e2ef9fa605b08dc1f534781573c3abf8 100644
--- a/net/sctp/chunk.c
+++ b/net/sctp/chunk.c
@@ -237,7 +237,9 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,
 	/* Account for a different sized first fragment */
 	if (msg_len >= first_len) {
 		msg->can_delay = 0;
-		SCTP_INC_STATS(sock_net(asoc->base.sk), SCTP_MIB_FRAGUSRMSGS);
+		if (msg_len > first_len)
+			SCTP_INC_STATS(sock_net(asoc->base.sk),
+				       SCTP_MIB_FRAGUSRMSGS);
 	} else {
 		/* Which may be the only one... */
 		first_len = msg_len;
-- 
2.14.4

^ permalink raw reply related

* [PATCH 1/5] ceph: use timespec64 in for keepalive
From: Arnd Bergmann @ 2018-06-20 15:50 UTC (permalink / raw)
  To: Yan Zheng, Sage Weil, Ilya Dryomov, David S. Miller
  Cc: y2038, ceph-devel, Arnd Bergmann, linux-kernel, netdev

ceph_con_keepalive_expired() is the last user of timespec_add() and some
of the last uses of ktime_get_real_ts().  Replacing this with timespec64
based interfaces  lets us remove that deprecated API.

I'm introducing new ceph_encode_timespec64()/ceph_decode_timespec64()
here that take timespec64 structures and convert to/from ceph_timespec,
which is defined to have an unsigned 32-bit tv_sec member. This extends
the range of valid times to year 2106, avoiding the year 2038 overflow.

The ceph file system portion still uses the old functions for inode
timestamps, this will be done separately after the VFS layer is converted.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/ceph/decode.h    | 20 +++++++++++++++++++-
 include/linux/ceph/messenger.h |  2 +-
 net/ceph/auth_x.c              | 14 +++++++-------
 net/ceph/auth_x.h              |  2 +-
 net/ceph/cls_lock_client.c     |  4 ++--
 net/ceph/messenger.c           | 20 ++++++++++----------
 6 files changed, 40 insertions(+), 22 deletions(-)

diff --git a/include/linux/ceph/decode.h b/include/linux/ceph/decode.h
index d143ac8879c6..45efe09fb7b4 100644
--- a/include/linux/ceph/decode.h
+++ b/include/linux/ceph/decode.h
@@ -194,8 +194,26 @@ ceph_decode_skip_n(p, end, sizeof(u8), bad)
 	} while (0)
 
 /*
- * struct ceph_timespec <-> struct timespec
+ * struct ceph_timespec <-> struct timespec64
  */
+static inline void ceph_decode_timespec64(struct timespec64 *ts,
+					const struct ceph_timespec *tv)
+{
+	/*
+	 * this will still overflow in year 2106. We could extend
+	 * the protocol to steal two more bits from tv_nsec to
+	 * add three more 136 year epochs after that the way ext4
+	 * does if necessary.
+	 */
+	ts->tv_sec = (time64_t)le32_to_cpu(tv->tv_sec);
+	ts->tv_nsec = (long)le32_to_cpu(tv->tv_nsec);
+}
+static inline void ceph_encode_timespec64(struct ceph_timespec *tv,
+					const struct timespec64 *ts)
+{
+	tv->tv_sec = cpu_to_le32((u32)ts->tv_sec);
+	tv->tv_nsec = cpu_to_le32((u32)ts->tv_nsec);
+}
 static inline void ceph_decode_timespec(struct timespec *ts,
 					const struct ceph_timespec *tv)
 {
diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h
index c7dfcb8a1fb2..a718b877c597 100644
--- a/include/linux/ceph/messenger.h
+++ b/include/linux/ceph/messenger.h
@@ -330,7 +330,7 @@ struct ceph_connection {
 	int in_base_pos;     /* bytes read */
 	__le64 in_temp_ack;  /* for reading an ack */
 
-	struct timespec last_keepalive_ack; /* keepalive2 ack stamp */
+	struct timespec64 last_keepalive_ack; /* keepalive2 ack stamp */
 
 	struct delayed_work work;	    /* send|recv work */
 	unsigned long       delay;          /* current delay interval */
diff --git a/net/ceph/auth_x.c b/net/ceph/auth_x.c
index 2f4a1baf5f52..b05c3a540a5a 100644
--- a/net/ceph/auth_x.c
+++ b/net/ceph/auth_x.c
@@ -149,12 +149,12 @@ static int process_one_ticket(struct ceph_auth_client *ac,
 	void *dp, *dend;
 	int dlen;
 	char is_enc;
-	struct timespec validity;
+	struct timespec64 validity;
 	void *tp, *tpend;
 	void **ptp;
 	struct ceph_crypto_key new_session_key = { 0 };
 	struct ceph_buffer *new_ticket_blob;
-	unsigned long new_expires, new_renew_after;
+	time64_t new_expires, new_renew_after;
 	u64 new_secret_id;
 	int ret;
 
@@ -189,11 +189,11 @@ static int process_one_ticket(struct ceph_auth_client *ac,
 	if (ret)
 		goto out;
 
-	ceph_decode_timespec(&validity, dp);
+	ceph_decode_timespec64(&validity, dp);
 	dp += sizeof(struct ceph_timespec);
-	new_expires = get_seconds() + validity.tv_sec;
+	new_expires = ktime_get_real_seconds() + validity.tv_sec;
 	new_renew_after = new_expires - (validity.tv_sec / 4);
-	dout(" expires=%lu renew_after=%lu\n", new_expires,
+	dout(" expires=%llu renew_after=%llu\n", new_expires,
 	     new_renew_after);
 
 	/* ticket blob for service */
@@ -385,13 +385,13 @@ static bool need_key(struct ceph_x_ticket_handler *th)
 	if (!th->have_key)
 		return true;
 
-	return get_seconds() >= th->renew_after;
+	return ktime_get_real_seconds() >= th->renew_after;
 }
 
 static bool have_key(struct ceph_x_ticket_handler *th)
 {
 	if (th->have_key) {
-		if (get_seconds() >= th->expires)
+		if (ktime_get_real_seconds() >= th->expires)
 			th->have_key = false;
 	}
 
diff --git a/net/ceph/auth_x.h b/net/ceph/auth_x.h
index 454cb54568af..57ba99f7736f 100644
--- a/net/ceph/auth_x.h
+++ b/net/ceph/auth_x.h
@@ -22,7 +22,7 @@ struct ceph_x_ticket_handler {
 	u64 secret_id;
 	struct ceph_buffer *ticket_blob;
 
-	unsigned long renew_after, expires;
+	time64_t renew_after, expires;
 };
 
 #define CEPHX_AU_ENC_BUF_LEN	128  /* big enough for encrypted blob */
diff --git a/net/ceph/cls_lock_client.c b/net/ceph/cls_lock_client.c
index 8d2032b2f225..2105a6eaa66c 100644
--- a/net/ceph/cls_lock_client.c
+++ b/net/ceph/cls_lock_client.c
@@ -32,7 +32,7 @@ int ceph_cls_lock(struct ceph_osd_client *osdc,
 	int desc_len = strlen(desc);
 	void *p, *end;
 	struct page *lock_op_page;
-	struct timespec mtime;
+	struct timespec64 mtime;
 	int ret;
 
 	lock_op_buf_size = name_len + sizeof(__le32) +
@@ -63,7 +63,7 @@ int ceph_cls_lock(struct ceph_osd_client *osdc,
 	ceph_encode_string(&p, end, desc, desc_len);
 	/* only support infinite duration */
 	memset(&mtime, 0, sizeof(mtime));
-	ceph_encode_timespec(p, &mtime);
+	ceph_encode_timespec64(p, &mtime);
 	p += sizeof(struct ceph_timespec);
 	ceph_encode_8(&p, flags);
 
diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
index c6413c360771..3f6336248509 100644
--- a/net/ceph/messenger.c
+++ b/net/ceph/messenger.c
@@ -1417,11 +1417,11 @@ static void prepare_write_keepalive(struct ceph_connection *con)
 	dout("prepare_write_keepalive %p\n", con);
 	con_out_kvec_reset(con);
 	if (con->peer_features & CEPH_FEATURE_MSGR_KEEPALIVE2) {
-		struct timespec now;
+		struct timespec64 now;
 
-		ktime_get_real_ts(&now);
+		ktime_get_real_ts64(&now);
 		con_out_kvec_add(con, sizeof(tag_keepalive2), &tag_keepalive2);
-		ceph_encode_timespec(&con->out_temp_keepalive2, &now);
+		ceph_encode_timespec64(&con->out_temp_keepalive2, &now);
 		con_out_kvec_add(con, sizeof(con->out_temp_keepalive2),
 				 &con->out_temp_keepalive2);
 	} else {
@@ -2555,7 +2555,7 @@ static int read_keepalive_ack(struct ceph_connection *con)
 	int ret = read_partial(con, size, size, &ceph_ts);
 	if (ret <= 0)
 		return ret;
-	ceph_decode_timespec(&con->last_keepalive_ack, &ceph_ts);
+	ceph_decode_timespec64(&con->last_keepalive_ack, &ceph_ts);
 	prepare_read_tag(con);
 	return 1;
 }
@@ -3223,12 +3223,12 @@ bool ceph_con_keepalive_expired(struct ceph_connection *con,
 {
 	if (interval > 0 &&
 	    (con->peer_features & CEPH_FEATURE_MSGR_KEEPALIVE2)) {
-		struct timespec now;
-		struct timespec ts;
-		ktime_get_real_ts(&now);
-		jiffies_to_timespec(interval, &ts);
-		ts = timespec_add(con->last_keepalive_ack, ts);
-		return timespec_compare(&now, &ts) >= 0;
+		struct timespec64 now;
+		struct timespec64 ts;
+		ktime_get_real_ts64(&now);
+		jiffies_to_timespec64(interval, &ts);
+		ts = timespec64_add(con->last_keepalive_ack, ts);
+		return timespec64_compare(&now, &ts) >= 0;
 	}
 	return false;
 }
-- 
2.9.0

^ permalink raw reply related

* [PATCH 4/5] ceph: use timespec64 for r_mtime
From: Arnd Bergmann @ 2018-06-20 15:50 UTC (permalink / raw)
  To: Yan Zheng, Sage Weil, Ilya Dryomov, Alex Elder
  Cc: Jens Axboe, Jan Kara, Martin K. Petersen, Arnd Bergmann, y2038,
	netdev, linux-kernel, Daniel Jordan, linux-block, ceph-devel,
	Jason Dillaman, David S. Miller
In-Reply-To: <20180620155101.57685-1-arnd@arndb.de>

The request mtime field is used all over ceph, and is currently
represented as a 'timespec' structure in Linux. This changes it to
timespec64 to allow times beyond 2038, modifying all users at the
same time.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 drivers/block/rbd.c             |  2 +-
 fs/ceph/addr.c                  | 12 ++++++------
 fs/ceph/file.c                  | 11 +++++------
 include/linux/ceph/osd_client.h |  6 +++---
 net/ceph/osd_client.c           |  8 ++++----
 5 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index fa0729c1e776..356936333cd9 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1452,7 +1452,7 @@ static void rbd_osd_req_format_write(struct rbd_obj_request *obj_request)
 	struct ceph_osd_request *osd_req = obj_request->osd_req;
 
 	osd_req->r_flags = CEPH_OSD_FLAG_WRITE;
-	ktime_get_real_ts(&osd_req->r_mtime);
+	ktime_get_real_ts64(&osd_req->r_mtime);
 	osd_req->r_data_offset = obj_request->ex.oe_off;
 }
 
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 292b3d72d725..d44d51e69e76 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -574,7 +574,7 @@ static u64 get_writepages_data_length(struct inode *inode,
  */
 static int writepage_nounlock(struct page *page, struct writeback_control *wbc)
 {
-	struct timespec ts;
+	struct timespec64 ts;
 	struct inode *inode;
 	struct ceph_inode_info *ci;
 	struct ceph_fs_client *fsc;
@@ -625,7 +625,7 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc)
 		set_bdi_congested(inode_to_bdi(inode), BLK_RW_ASYNC);
 
 	set_page_writeback(page);
-	ts = timespec64_to_timespec(inode->i_mtime);
+	ts = inode->i_mtime;
 	err = ceph_osdc_writepages(&fsc->client->osdc, ceph_vino(inode),
 				   &ci->i_layout, snapc, page_off, len,
 				   ceph_wbc.truncate_seq,
@@ -1134,7 +1134,7 @@ static int ceph_writepages_start(struct address_space *mapping,
 			pages = NULL;
 		}
 
-		req->r_mtime = timespec64_to_timespec(inode->i_mtime);
+		req->r_mtime = inode->i_mtime;
 		rc = ceph_osdc_start_request(&fsc->client->osdc, req, true);
 		BUG_ON(rc);
 		req = NULL;
@@ -1734,7 +1734,7 @@ int ceph_uninline_data(struct file *filp, struct page *locked_page)
 		goto out;
 	}
 
-	req->r_mtime = timespec64_to_timespec(inode->i_mtime);
+	req->r_mtime = inode->i_mtime;
 	err = ceph_osdc_start_request(&fsc->client->osdc, req, false);
 	if (!err)
 		err = ceph_osdc_wait_request(&fsc->client->osdc, req);
@@ -1776,7 +1776,7 @@ int ceph_uninline_data(struct file *filp, struct page *locked_page)
 			goto out_put;
 	}
 
-	req->r_mtime = timespec64_to_timespec(inode->i_mtime);
+	req->r_mtime = inode->i_mtime;
 	err = ceph_osdc_start_request(&fsc->client->osdc, req, false);
 	if (!err)
 		err = ceph_osdc_wait_request(&fsc->client->osdc, req);
@@ -1937,7 +1937,7 @@ static int __ceph_pool_perm_get(struct ceph_inode_info *ci,
 				     0, false, true);
 	err = ceph_osdc_start_request(&fsc->client->osdc, rd_req, false);
 
-	wr_req->r_mtime = timespec64_to_timespec(ci->vfs_inode.i_mtime);
+	wr_req->r_mtime = ci->vfs_inode.i_mtime;
 	err2 = ceph_osdc_start_request(&fsc->client->osdc, wr_req, false);
 
 	if (!err)
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index ad0bed99b1d5..1795a8dc9a1e 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -721,7 +721,7 @@ struct ceph_aio_request {
 	struct list_head osd_reqs;
 	unsigned num_reqs;
 	atomic_t pending_reqs;
-	struct timespec mtime;
+	struct timespec64 mtime;
 	struct ceph_cap_flush *prealloc_cf;
 };
 
@@ -923,7 +923,7 @@ ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter,
 	int num_pages = 0;
 	int flags;
 	int ret;
-	struct timespec mtime = timespec64_to_timespec(current_time(inode));
+	struct timespec64 mtime = current_time(inode);
 	size_t count = iov_iter_count(iter);
 	loff_t pos = iocb->ki_pos;
 	bool write = iov_iter_rw(iter) == WRITE;
@@ -1013,7 +1013,7 @@ ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter,
 			truncate_inode_pages_range(inode->i_mapping, pos,
 					(pos+len) | (PAGE_SIZE - 1));
 
-			req->r_mtime = mtime;
+			req->r_mtime = current_time(inode);
 		}
 
 		osd_req_op_extent_osd_data_bvecs(req, 0, bvecs, num_pages, len);
@@ -1131,7 +1131,6 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
 	int flags;
 	int ret;
 	bool check_caps = false;
-	struct timespec mtime = timespec64_to_timespec(current_time(inode));
 	size_t count = iov_iter_count(from);
 
 	if (ceph_snap(file_inode(file)) != CEPH_NOSNAP)
@@ -1201,7 +1200,7 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
 		osd_req_op_extent_osd_data_pages(req, 0, pages, len, 0,
 						false, true);
 
-		req->r_mtime = mtime;
+		req->r_mtime = current_time(inode);
 		ret = ceph_osdc_start_request(&fsc->client->osdc, req, false);
 		if (!ret)
 			ret = ceph_osdc_wait_request(&fsc->client->osdc, req);
@@ -1663,7 +1662,7 @@ static int ceph_zero_partial_object(struct inode *inode,
 		goto out;
 	}
 
-	req->r_mtime = timespec64_to_timespec(inode->i_mtime);
+	req->r_mtime = inode->i_mtime;
 	ret = ceph_osdc_start_request(&fsc->client->osdc, req, false);
 	if (!ret) {
 		ret = ceph_osdc_wait_request(&fsc->client->osdc, req);
diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h
index 0d6ee04b4c41..2e6611c1e9a0 100644
--- a/include/linux/ceph/osd_client.h
+++ b/include/linux/ceph/osd_client.h
@@ -199,7 +199,7 @@ struct ceph_osd_request {
 	/* set by submitter */
 	u64 r_snapid;                         /* for reads, CEPH_NOSNAP o/w */
 	struct ceph_snap_context *r_snapc;    /* for writes */
-	struct timespec r_mtime;              /* ditto */
+	struct timespec64 r_mtime;            /* ditto */
 	u64 r_data_offset;                    /* ditto */
 	bool r_linger;                        /* don't resend on failure */
 
@@ -253,7 +253,7 @@ struct ceph_osd_linger_request {
 	struct ceph_osd_request_target t;
 	u32 map_dne_bound;
 
-	struct timespec mtime;
+	struct timespec64 mtime;
 
 	struct kref kref;
 	struct mutex lock;
@@ -508,7 +508,7 @@ extern int ceph_osdc_writepages(struct ceph_osd_client *osdc,
 				struct ceph_snap_context *sc,
 				u64 off, u64 len,
 				u32 truncate_seq, u64 truncate_size,
-				struct timespec *mtime,
+				struct timespec64 *mtime,
 				struct page **pages, int nr_pages);
 
 /* watch/notify */
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index a00c74f1154e..a87a021ca9d0 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -1978,7 +1978,7 @@ static void encode_request_partial(struct ceph_osd_request *req,
 	p += sizeof(struct ceph_blkin_trace_info);
 
 	ceph_encode_32(&p, 0); /* client_inc, always 0 */
-	ceph_encode_timespec(p, &req->r_mtime);
+	ceph_encode_timespec64(p, &req->r_mtime);
 	p += sizeof(struct ceph_timespec);
 
 	encode_oloc(&p, end, &req->r_t.target_oloc);
@@ -4512,7 +4512,7 @@ ceph_osdc_watch(struct ceph_osd_client *osdc,
 	ceph_oid_copy(&lreq->t.base_oid, oid);
 	ceph_oloc_copy(&lreq->t.base_oloc, oloc);
 	lreq->t.flags = CEPH_OSD_FLAG_WRITE;
-	ktime_get_real_ts(&lreq->mtime);
+	ktime_get_real_ts64(&lreq->mtime);
 
 	lreq->reg_req = alloc_linger_request(lreq);
 	if (!lreq->reg_req) {
@@ -4570,7 +4570,7 @@ int ceph_osdc_unwatch(struct ceph_osd_client *osdc,
 	ceph_oid_copy(&req->r_base_oid, &lreq->t.base_oid);
 	ceph_oloc_copy(&req->r_base_oloc, &lreq->t.base_oloc);
 	req->r_flags = CEPH_OSD_FLAG_WRITE;
-	ktime_get_real_ts(&req->r_mtime);
+	ktime_get_real_ts64(&req->r_mtime);
 	osd_req_op_watch_init(req, 0, lreq->linger_id,
 			      CEPH_OSD_WATCH_OP_UNWATCH);
 
@@ -5136,7 +5136,7 @@ int ceph_osdc_writepages(struct ceph_osd_client *osdc, struct ceph_vino vino,
 			 struct ceph_snap_context *snapc,
 			 u64 off, u64 len,
 			 u32 truncate_seq, u64 truncate_size,
-			 struct timespec *mtime,
+			 struct timespec64 *mtime,
 			 struct page **pages, int num_pages)
 {
 	struct ceph_osd_request *req;
-- 
2.9.0

_______________________________________________
Y2038 mailing list
Y2038@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/y2038

^ permalink raw reply related

* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Cornelia Huck @ 2018-06-20 16:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alexander Duyck, virtio-dev, Jiri Pirko, konrad.wilk,
	Jakub Kicinski, Samudrala, Sridhar, qemu-devel, virtualization,
	Siwei Liu, Venu Busireddy, Netdev, boris.ostrovsky, aaron.f.brown,
	Joao Martins
In-Reply-To: <20180620170904-mutt-send-email-mst@kernel.org>

On Wed, 20 Jun 2018 17:11:59 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Wed, Jun 20, 2018 at 11:53:59AM +0200, Cornelia Huck wrote:
> > On Tue, 19 Jun 2018 23:32:06 +0300
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >   
> > > On Tue, Jun 19, 2018 at 12:54:53PM +0200, Cornelia Huck wrote:  
> > > > Sorry about dragging mainframes into this, but this will only work for
> > > > homogenous device coupling, not for heterogenous. Consider my vfio-pci
> > > > + virtio-net-ccw example again: The guest cannot find out that the two
> > > > belong together by checking some group ID, it has to either use the MAC
> > > > or some needs-to-be-architectured property.
> > > > 
> > > > Alternatively, we could propose that mechanism as pci-only, which means
> > > > we can rely on mechanisms that won't necessarily work on non-pci
> > > > transports. (FWIW, I don't see a use case for using vfio-ccw to pass
> > > > through a network card anytime in the near future, due to the nature of
> > > > network cards currently in use on s390.)    
> > > 
> > > That's what it boils down to, yes.  If there's need to have this for
> > > non-pci devices, then we should put it in config space.
> > > Cornelia, what do you think?
> > >   
> > 
> > I think the only really useful config on s390 is the vfio-pci network
> > card coupled with a virtio-net-ccw device: Using an s390 network card
> > via vfio-ccw is out due to the nature of the s390 network cards, and
> > virtio-ccw is the default transport (virtio-pci is not supported on any
> > enterprise distro AFAIK).
> > 
> > For this, having a uuid in the config space could work (vfio-pci
> > devices have a config space by virtue of being pci devices, and
> > virtio-net-ccw devices have a config space by virtue of being virtio
> > devices -- ccw devices usually don't have that concept).  
> 
> OK so this calls for adding such a field generally (it's
> device agnostic right now).
> 
> How would you suggest doing that?

I hope that I'm not thoroughly confused at this point in time, so I'll
summarize my current understanding (also keep in mind that I haven't
looked at Venu's patches yet):

- The Linux guest initiates coupling from the virtio-net driver.
  Matching the other device is done via the MAC, and only pci devices
  are allowed for the failover device. (There does not seem to be any
  restriction on the transport of the virtio-net device.)
- The Linux guest virtio-net driver does not allow changing the MAC if
  standby has been negotiated (implying that the hypervisor needs to
  configure the correct MAC).
- In QEMU, we need to know which two devices (vfio-pci and virtio-net)
  go together, so that the virtio-net device gets the correct MAC. We
  also need the pairing so that we can make the vfio-pci device
  available once the guest has negotiated the standby feature.

We can tack the two devices together in QEMU by introducing new,
optional properties pointing from the virtio-net device to the vfio-pci
device (only offer standby if this is set) and the other way around
(don't make the device visible at the start if this is set). Problems:

- The admin needs to figure out the MAC by themselves and set it
  correctly. If this is incorrect, the vfio-pci device cannot be found
  in the guest. (Not sure how much of a problem this is in practice --
  and QEMU cannot figure out the MAC without poking at the vfio-pci
  device, and we probably want to avoid that.)
- This two-way pointing makes for interesting handing of the command
  line and when both devices are plugged later.

In any case, I'm not sure anymore why we'd want the extra uuid. Is
there any way QEMU (or libvirt) can figure it out without actually
looking at the vfio-pci device?

^ permalink raw reply

* Re: [PATCH v4 net-next] net:sched: add action inheritdsfield to skbedit
From: Davide Caratti @ 2018-06-20 16:08 UTC (permalink / raw)
  To: Fu, Qiaobin, davem@davemloft.net
  Cc: netdev@vger.kernel.org, jhs@mojatatu.com, Michel Machado,
	Marcelo Ricardo Leitner, xiyou.wangcong@gmail.com
In-Reply-To: <E5616B71-959E-43FE-9096-BF523948ABB2@bu.edu>

On Tue, 2018-06-12 at 15:42 +0000, Fu, Qiaobin wrote:
> The new action inheritdsfield copies the field DS of
> IPv4 and IPv6 packets into skb->priority. This enables
> later classification of packets based on the DS field.
> 
> v4:
> *Not allow setting flags other than the expected ones.
> 
> *Allow dumping the pure flags.
> 
> Original idea by Jamal Hadi Salim <jhs@mojatatu.com>
> 
> Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
> Reviewed-by: Michel Machado <michel@digirati.com.br>
> ---
> 

[...]

> @@ -41,6 +44,25 @@ static int tcf_skbedit(struct sk_buff *skb, const struct tc_action *a,
>  
>  	if (d->flags & SKBEDIT_F_PRIORITY)
>  		skb->priority = d->priority;
> +	if (d->flags & SKBEDIT_F_INHERITDSFIELD) {
> +		int wlen = skb_network_offset(skb);
> +
> +		switch (tc_skb_protocol(skb)) {
> +		case htons(ETH_P_IP):
> +			wlen += sizeof(struct iphdr);
> +			if (!pskb_may_pull(skb, wlen))
> +				goto err;
> +			skb->priority = ipv4_get_dsfield(ip_hdr(skb)) >> 2;
> +			break;
> +
> +		case htons(ETH_P_IPV6):
> +			wlen += sizeof(struct ipv6hdr);
> +			if (!pskb_may_pull(skb, wlen))
> +				goto err;
> +			skb->priority = ipv6_get_dsfield(ipv6_hdr(skb)) >> 2;
> +			break;
> +		}
> +	}
>  	if (d->flags & SKBEDIT_F_QUEUE_MAPPING &&
>  	    skb->dev->real_num_tx_queues > d->queue_mapping)
>  		skb_set_queue_mapping(skb, d->queue_mapping);
> @@ -53,6 +75,10 @@ static int tcf_skbedit(struct sk_buff *skb, const struct tc_action *a,
>  
>  	spin_unlock(&d->tcf_lock);
>  	return d->tcf_action;
> +
> +err:
> +	spin_unlock(&d->tcf_lock);
> +	return TC_ACT_SHOT;
>  }
>  

sorry for asking this when the patch is a v4... 

I spotted this, as I'm rebasing a small series that removes the tcf_lock
from the data plane of skbedit to gain some speed, and it converts the
stats to be per-cpu counters.

in the code above, you are catching failures of pskb_may_pull(skb) and
then you return TC_ACT_SHOT. That's OK, but I think you should update the
drop counter, like other TC actions do.

If you (author / reviewers) think this is a minor issue, like I do think,
then I can add the missing update in my series and post it when net-next
reopens.

WDYT?

thank you in advance!
regards,
-- 
davide
  

^ permalink raw reply

* [PATCH v2] ucc_geth: Add BQL support
From: Joakim Tjernlund @ 2018-06-20 16:29 UTC (permalink / raw)
  To: David Miller, leoyang.li, netdev; +Cc: Joakim Tjernlund
In-Reply-To: <20180619163036.20578-1-joakim.tjernlund@infinera.com>

Signed-off-by: Joakim Tjernlund <joakim.tjernlund@infinera.com>
---

 v2 - Reoder varibles according to Dave
      Add call to netdev_reset_queue(dev) open/close
 drivers/net/ethernet/freescale/ucc_geth.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/ucc_geth.c b/drivers/net/ethernet/freescale/ucc_geth.c
index f77ba9fa257b..e8debbde0a34 100644
--- a/drivers/net/ethernet/freescale/ucc_geth.c
+++ b/drivers/net/ethernet/freescale/ucc_geth.c
@@ -3096,6 +3096,7 @@ static int ucc_geth_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	ugeth_vdbg("%s: IN", __func__);
 
+	netdev_sent_queue(dev, skb->len);
 	spin_lock_irqsave(&ugeth->lock, flags);
 
 	dev->stats.tx_bytes += skb->len;
@@ -3240,6 +3241,8 @@ static int ucc_geth_tx(struct net_device *dev, u8 txQ)
 {
 	/* Start from the next BD that should be filled */
 	struct ucc_geth_private *ugeth = netdev_priv(dev);
+	unsigned int bytes_sent = 0;
+	int howmany = 0;
 	u8 __iomem *bd;		/* BD pointer */
 	u32 bd_status;
 
@@ -3257,7 +3260,8 @@ static int ucc_geth_tx(struct net_device *dev, u8 txQ)
 		skb = ugeth->tx_skbuff[txQ][ugeth->skb_dirtytx[txQ]];
 		if (!skb)
 			break;
-
+		howmany++;
+		bytes_sent += skb->len;
 		dev->stats.tx_packets++;
 
 		dev_consume_skb_any(skb);
@@ -3279,6 +3283,7 @@ static int ucc_geth_tx(struct net_device *dev, u8 txQ)
 		bd_status = in_be32((u32 __iomem *)bd);
 	}
 	ugeth->confBd[txQ] = bd;
+	netdev_completed_queue(dev, howmany, bytes_sent);
 	return 0;
 }
 
@@ -3479,6 +3484,7 @@ static int ucc_geth_open(struct net_device *dev)
 
 	phy_start(ugeth->phydev);
 	napi_enable(&ugeth->napi);
+	netdev_reset_queue(dev);
 	netif_start_queue(dev);
 
 	device_set_wakeup_capable(&dev->dev,
@@ -3509,6 +3515,7 @@ static int ucc_geth_close(struct net_device *dev)
 	free_irq(ugeth->ug_info->uf_info.irq, ugeth->ndev);
 
 	netif_stop_queue(dev);
+	netdev_reset_queue(dev);
 
 	return 0;
 }
-- 
2.13.6

^ permalink raw reply related

* Re: [PATCH] tc: fix batch force option
From: Stephen Hemminger @ 2018-06-20 16:33 UTC (permalink / raw)
  To: Vlad Buslov; +Cc: netdev, chrism
In-Reply-To: <1529479461-32048-1-git-send-email-vladbu@mellanox.com>

On Wed, 20 Jun 2018 10:24:21 +0300
Vlad Buslov <vladbu@mellanox.com> wrote:

> When sending accumulated compound command results an error, check 'force'
> option before exiting. Move return code check after putting batch bufs and
> freeing iovs to prevent memory leak. Break from loop, instead of returning
> error code to allow cleanup at the end of batch function. Don't reset ret
> code on each iteration.
> 
> Fixes: 485d0c6001c4 ("tc: Add batchsize feature for filter and actions")
> Reviewed-by: Roi Dayan <roid@mellanox.com>
> Reviewed-by: Chris Mi <chrism@mellanox.com>
> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>

Applied

^ permalink raw reply

* [PATCH net] netfilter: nf_log: fix uninit read in nf_log_proc_dostring
From: Jann Horn @ 2018-06-20 16:33 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal,
	netfilter-devel, coreteam, jannh
  Cc: David S. Miller, netdev, linux-kernel

When proc_dostring() is called with a non-zero offset in strict mode, it
doesn't just write to the ->data buffer, it also reads. Make sure it
doesn't read uninitialized data.

Fixes: c6ac37d8d884 ("netfilter: nf_log: fix error on write NONE to [...]")
Signed-off-by: Jann Horn <jannh@google.com>
---
 net/netfilter/nf_log.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/netfilter/nf_log.c b/net/netfilter/nf_log.c
index 426457047578..2c47f9ec3511 100644
--- a/net/netfilter/nf_log.c
+++ b/net/netfilter/nf_log.c
@@ -424,6 +424,10 @@ static int nf_log_proc_dostring(struct ctl_table *table, int write,
 	if (write) {
 		struct ctl_table tmp = *table;
 
+		/* proc_dostring() can append to existing strings, so we need to
+		 * initialize it as an empty string.
+		 */
+		buf[0] = '\0';
 		tmp.data = buf;
 		r = proc_dostring(&tmp, write, buffer, lenp, ppos);
 		if (r)
-- 
2.18.0.rc1.244.gcf134e6275-goog

^ permalink raw reply related

* Re: [PATCH] [Bug 16494] NFS client over TCP hangs due to packet loss
From: Andy C @ 2018-06-20 16:40 UTC (permalink / raw)
  To: Joe Perches, 'Trond Myklebust'
  Cc: 'Andrew Morton', 'David Miller', kuznet, pekkas,
	jmorris, yoshfuji, kaber, eric.dumazet, William.Allen.Simpson,
	gilad, ilpo.jarvinen, netdev, linux-kernel, linux-nfs,
	'J. Bruce Fields', 'Neil Brown',
	'Chuck Lever', 'Benny Halevy',
	'Alexandros Batsakis', Andy Chittenden
In-Reply-To: <0dead6e0deabafe0abdd4b30f97bcdf5c12cd824.camel@perches.com>

On 2018-06-19 22:56, Joe Perches wrote:
> On Mon, 2010-08-09 at 10:27 +0100, Andy Chittenden wrote:
>
> You really need to check your clock.
> Mail sent in the year 2010?
>
I didn't send it just now. I sent it on 2010-08-09 at 10:27. And I did 
receive a copy at that time. If I'm reading the headers correctly, 
something happened here:

Received: from mchehab by bombadil.infradead.org with local (Exim 4.90_1 #2 (Red Hat Linux))
	id 1fVNCq-0005da-3g; Tue, 19 Jun 2018 20:26:24 +0000
Received: from vger.kernel.org ([209.132.180.67])
	by bombadil.infradead.org with esmtp (Exim 4.72 #1 (Red Hat Linux))
	id 1OiOdm-0004za-Ks; Mon, 09 Aug 2010 09:27:30 +0000
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1755981Ab0HIJ1U (ORCPT <rfc822;tgraf@infradead.org> + 8 others);
	Mon, 9 Aug 2010 05:27:20 -0400

-- 
Andy

^ permalink raw reply

* Re: [PATCH] tc, bpf: add option to dump bpf verifier as C program fragment
From: Stephen Hemminger @ 2018-06-20 16:40 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: David Ahern, Jakub Kicinski, Ophir Munk, netdev, Thomas Monjalon,
	Olga Shern, ast
In-Reply-To: <e454ef12-0ae7-7122-3801-321a62c5d0ce@iogearbox.net>

On Wed, 20 Jun 2018 00:13:52 +0200
Daniel Borkmann <daniel@iogearbox.net> wrote:

> On 06/18/2018 11:44 PM, David Ahern wrote:
> > On 6/18/18 2:18 PM, Jakub Kicinski wrote:  
> >> On Sun, 17 Jun 2018 08:48:41 +0000, Ophir Munk wrote:  
> >>> Similar to cbpf used within tcpdump utility with a "-d" option to dump
> >>> the compiled packet-matching code in a human readable form - tc has the
> >>> "verbose" option to dump ebpf verifier output.
> >>> Another useful option of cbpf using tcpdump "-dd" option is to dump
> >>> packet-matching code a C program fragment. Similar to this - this commit
> >>> adds a new tc ebpf option named "code" to dump ebpf verifier as C program
> >>> fragment.
> >>>
> >>> Existing "verbose" option sample output:
> >>>
> >>> Verifier analysis:
> >>> 0: (61) r2 = *(u32 *)(r1 +52)
> >>> 1: (18) r3 = 0xdeadbeef
> >>> 3: (63) *(u32 *)(r10 -4) = r3
> >>> .
> >>> .
> >>> 11: (63) *(u32 *)(r1 +52) = r2
> >>> 12: (18) r0 = 0xffffffff
> >>> 14: (95) exit
> >>>
> >>> New "code" option sample output:
> >>>
> >>> /* struct bpf_insn cls_q_code[] = { */
> >>> {0x61,    2,    1,       52, 0x00000000},
> >>> {0x18,    3,    0,        0, 0xdeadbeef},
> >>> {0x00,    0,    0,        0, 0x00000000},
> >>> .
> >>> .
> >>> {0x63,    1,    2,       52, 0x00000000},
> >>> {0x18,    0,    0,        0, 0xffffffff},
> >>> {0x00,    0,    0,        0, 0x00000000},
> >>> {0x95,    0,    0,        0, 0x00000000},
> >>>
> >>> Signed-off-by: Ophir Munk <ophirmu@mellanox.com>  
> >>
> >> Hmm... printing C arrays looks like hacky integration with some C
> >> code...  Would you not be better served by simply using libbpf in
> >> whatever is consuming this output?  
> > 
> > I was thinking the same. bpftool would provide options too -- print the
> > above, print in macro encodings and verifier. I gave an example of this
> > side by side dump at netconf 2.1. Does not look like the slides made it
> > online; see attached.  
> 
> +1, I would also doubt that this adds a lot in terms of debuggability
> when you're trying to load an object file with thousands of insns. Better
> way would be to use llvm-objdump on the obj file to get to the annotated
> disassembly, see also example in [0]. A .o to .c converter is wip for
> libbpf/bpftool as presented in [1], which should provide the flexibility
> to embedd an obj file.
> 
> Cheers,
> Daniel
> 
>   [0] http://cilium.readthedocs.io/en/latest/bpf/#llvm
>   [1] http://vger.kernel.org/netconf2018_files/AlexeiStarovoitov_netconf2018.pdf page 22

I am going to not accept this for now. Please respin for iproute2 next if
you think bpftool won't be able to handle this.

^ permalink raw reply

* Re: [PATCH v4 net-next] net:sched: add action inheritdsfield to skbedit
From: Michel Machado @ 2018-06-20 16:47 UTC (permalink / raw)
  To: Davide Caratti, Fu, Qiaobin, davem@davemloft.net
  Cc: netdev@vger.kernel.org, jhs@mojatatu.com, Marcelo Ricardo Leitner,
	xiyou.wangcong@gmail.com
In-Reply-To: <264a03455daf28d33809ea77e4badc6a5ec5c9e6.camel@redhat.com>

On 06/20/2018 12:08 PM, Davide Caratti wrote:
> On Tue, 2018-06-12 at 15:42 +0000, Fu, Qiaobin wrote:
>> The new action inheritdsfield copies the field DS of
>> IPv4 and IPv6 packets into skb->priority. This enables
>> later classification of packets based on the DS field.
>>
>> v4:
>> *Not allow setting flags other than the expected ones.
>>
>> *Allow dumping the pure flags.
>>
>> Original idea by Jamal Hadi Salim <jhs@mojatatu.com>
>>
>> Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
>> Reviewed-by: Michel Machado <michel@digirati.com.br>
>> ---
>>
> 
> [...]
> 
>> @@ -41,6 +44,25 @@ static int tcf_skbedit(struct sk_buff *skb, const struct tc_action *a,
>>   
>>   	if (d->flags & SKBEDIT_F_PRIORITY)
>>   		skb->priority = d->priority;
>> +	if (d->flags & SKBEDIT_F_INHERITDSFIELD) {
>> +		int wlen = skb_network_offset(skb);
>> +
>> +		switch (tc_skb_protocol(skb)) {
>> +		case htons(ETH_P_IP):
>> +			wlen += sizeof(struct iphdr);
>> +			if (!pskb_may_pull(skb, wlen))
>> +				goto err;
>> +			skb->priority = ipv4_get_dsfield(ip_hdr(skb)) >> 2;
>> +			break;
>> +
>> +		case htons(ETH_P_IPV6):
>> +			wlen += sizeof(struct ipv6hdr);
>> +			if (!pskb_may_pull(skb, wlen))
>> +				goto err;
>> +			skb->priority = ipv6_get_dsfield(ipv6_hdr(skb)) >> 2;
>> +			break;
>> +		}
>> +	}
>>   	if (d->flags & SKBEDIT_F_QUEUE_MAPPING &&
>>   	    skb->dev->real_num_tx_queues > d->queue_mapping)
>>   		skb_set_queue_mapping(skb, d->queue_mapping);
>> @@ -53,6 +75,10 @@ static int tcf_skbedit(struct sk_buff *skb, const struct tc_action *a,
>>   
>>   	spin_unlock(&d->tcf_lock);
>>   	return d->tcf_action;
>> +
>> +err:
>> +	spin_unlock(&d->tcf_lock);
>> +	return TC_ACT_SHOT;
>>   }
>>   
> 
> sorry for asking this when the patch is a v4...
> 
> I spotted this, as I'm rebasing a small series that removes the tcf_lock
> from the data plane of skbedit to gain some speed, and it converts the
> stats to be per-cpu counters.
> 
> in the code above, you are catching failures of pskb_may_pull(skb) and
> then you return TC_ACT_SHOT. That's OK, but I think you should update the
> drop counter, like other TC actions do.
> 
> If you (author / reviewers) think this is a minor issue, like I do think,
> then I can add the missing update in my series and post it when net-next
> reopens.
> 
> WDYT?
> 
> thank you in advance!
> regards,
> 

Hi Davide,

    I agree that we should update the drop counter, but given that 
you're already converting the stats to be per-cpu counters, whatever we 
add now will be just symbolic since you're going to change it anyway. If 
reviewers think that Qiaobin's patch must add the update line, could you 
provide the exact line and location so we avoid going to v6 of this patch?

[ ]'s
Michel Machado

^ permalink raw reply

* Re: [PATCH v4 net-next] net:sched: add action inheritdsfield to skbedit
From: Davide Caratti @ 2018-06-20 17:02 UTC (permalink / raw)
  To: Michel Machado, Fu, Qiaobin, davem@davemloft.net
  Cc: netdev@vger.kernel.org, jhs@mojatatu.com, Marcelo Ricardo Leitner,
	xiyou.wangcong@gmail.com
In-Reply-To: <9e0d0a90-4ad5-722f-1033-40b72c3841ed@digirati.com.br>

On Wed, 2018-06-20 at 12:47 -0400, Michel Machado wrote:
> On 06/20/2018 12:08 PM, Davide Caratti wrote:
> > On Tue, 2018-06-12 at 15:42 +0000, Fu, Qiaobin wrote:
> > > The new action inheritdsfield copies the field DS of
> > > IPv4 and IPv6 packets into skb->priority. This enables
> > > later classification of packets based on the DS field.
> > > 
> > > v4:
> > > *Not allow setting flags other than the expected ones.
> > > 
> > > *Allow dumping the pure flags.
> > > 
> > > Original idea by Jamal Hadi Salim <jhs@mojatatu.com>
> > > 
> > > Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
> > > Reviewed-by: Michel Machado <michel@digirati.com.br>
> > > ---
> > > 
> > 
> > [...]
> > 
> > > @@ -41,6 +44,25 @@ static int tcf_skbedit(struct sk_buff *skb, const struct tc_action *a,
> > >   
> > >   	if (d->flags & SKBEDIT_F_PRIORITY)
> > >   		skb->priority = d->priority;
> > > +	if (d->flags & SKBEDIT_F_INHERITDSFIELD) {
> > > +		int wlen = skb_network_offset(skb);
> > > +
> > > +		switch (tc_skb_protocol(skb)) {
> > > +		case htons(ETH_P_IP):
> > > +			wlen += sizeof(struct iphdr);
> > > +			if (!pskb_may_pull(skb, wlen))
> > > +				goto err;
> > > +			skb->priority = ipv4_get_dsfield(ip_hdr(skb)) >> 2;
> > > +			break;
> > > +
> > > +		case htons(ETH_P_IPV6):
> > > +			wlen += sizeof(struct ipv6hdr);
> > > +			if (!pskb_may_pull(skb, wlen))
> > > +				goto err;
> > > +			skb->priority = ipv6_get_dsfield(ipv6_hdr(skb)) >> 2;
> > > +			break;
> > > +		}
> > > +	}
> > >   	if (d->flags & SKBEDIT_F_QUEUE_MAPPING &&
> > >   	    skb->dev->real_num_tx_queues > d->queue_mapping)
> > >   		skb_set_queue_mapping(skb, d->queue_mapping);
> > > @@ -53,6 +75,10 @@ static int tcf_skbedit(struct sk_buff *skb, const struct tc_action *a,
> > >   
> > >   	spin_unlock(&d->tcf_lock);
> > >   	return d->tcf_action;
> > > +
> > > +err:
> > > +	spin_unlock(&d->tcf_lock);
> > > +	return TC_ACT_SHOT;
> > >   }
> > >   
> > 
> > sorry for asking this when the patch is a v4...
> > 
> > I spotted this, as I'm rebasing a small series that removes the tcf_lock
> > from the data plane of skbedit to gain some speed, and it converts the
> > stats to be per-cpu counters.
> > 
> > in the code above, you are catching failures of pskb_may_pull(skb) and
> > then you return TC_ACT_SHOT. That's OK, but I think you should update the
> > drop counter, like other TC actions do.
> > 
> > If you (author / reviewers) think this is a minor issue, like I do think,
> > then I can add the missing update in my series and post it when net-next
> > reopens.
> > 
> > WDYT?
> > 
> > thank you in advance!
> > regards,
> > 
> 
> Hi Davide,
> 
>     I agree that we should update the drop counter, but given that 
> you're already converting the stats to be per-cpu counters, whatever we 
> add now will be just symbolic since you're going to change it anyway.

that's ok for me also, as I can use the current v4 code for the rebase
(and not wait for another respin) _ but let's hear what reviewers think.

>  If 
> reviewers think that Qiaobin's patch must add the update line, could you 
> provide the exact line and location so we avoid going to v6 of this patch?

In case, I was thinking of something like: 

https://elixir.bootlin.com/linux/v4.18-rc1/source/net/sched/act_ipt.c#L249

so, between 'err:' and 'spin_unlock(&d->tcf_lock)', insert a line like:

d->tcf_qstats.drop++;

regards,
-- 
davide 

^ permalink raw reply

* Re: [PATCH net] sctp: fix erroneous inc of snmp SctpFragUsrMsgs
From: Neil Horman @ 2018-06-20 17:08 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner; +Cc: netdev, linux-sctp, Vlad Yasevich
In-Reply-To: <d89c1e422158d21710ce938aa093a20960bd55e9.1529509634.git.marcelo.leitner@gmail.com>

On Wed, Jun 20, 2018 at 12:47:52PM -0300, Marcelo Ricardo Leitner wrote:
> Currently it is incrementing SctpFragUsrMsgs when the user message size
> is of the exactly same size as the maximum fragment size, which is wrong.
> 
> The fix is to increment it only when user message is bigger than the
> maximum fragment size.
> 
> Fixes: bfd2e4b8734d ("sctp: refactor sctp_datamsg_from_user")
> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> ---
>  net/sctp/chunk.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
> index 79daa98208c391c780440144d69bc7be875c3476..bfb9f812e2ef9fa605b08dc1f534781573c3abf8 100644
> --- a/net/sctp/chunk.c
> +++ b/net/sctp/chunk.c
> @@ -237,7 +237,9 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,
>  	/* Account for a different sized first fragment */
>  	if (msg_len >= first_len) {
>  		msg->can_delay = 0;
> -		SCTP_INC_STATS(sock_net(asoc->base.sk), SCTP_MIB_FRAGUSRMSGS);
> +		if (msg_len > first_len)
> +			SCTP_INC_STATS(sock_net(asoc->base.sk),
> +				       SCTP_MIB_FRAGUSRMSGS);
>  	} else {
>  		/* Which may be the only one... */
>  		first_len = msg_len;
> -- 
> 2.14.4
> 
> 
Acked-by: Neil Horman <nhorman@redhat.com>

^ permalink raw reply

* Re: [PATCH] net: Fix device name resolving crash in default_device_exit()
From: David Ahern @ 2018-06-20 17:15 UTC (permalink / raw)
  To: Kirill Tkhai, netdev
  Cc: davem, daniel, jakub.kicinski, ast, linux, john.fastabend, brouer
In-Reply-To: <88c1a635-4404-253b-6144-d7f3f9779531@virtuozzo.com>

On 6/20/18 2:57 AM, Kirill Tkhai wrote:
> From: Kirill Tkhai <ktkhai@virtuozzo.com>
> 
> The following script makes kernel to crash since it can't obtain
> a name for a device, when the name is occupied by another device:
> 
> #!/bin/bash
> ifconfig eth0 down
> ifconfig eth1 down
> index=`cat /sys/class/net/eth1/ifindex`
> ip link set eth1 name dev$index
> unshare -n sleep 1h &
> pid=$!
> while [[ "`readlink /proc/self/ns/net`" == "`readlink /proc/$pid/ns/net`" ]]; do continue; done
> ip link set dev$index netns $pid
> ip link set eth0 name dev$index
> kill -9 $pid
> 
> Kernel messages:
> 
> virtio_net virtio1 dev3: renamed from eth1
> virtio_net virtio0 dev3: renamed from eth0
> default_device_exit: failed to move dev3 to init_net: -17
> ------------[ cut here ]------------
> kernel BUG at net/core/dev.c:8978!
> invalid opcode: 0000 [#1] PREEMPT SMP
> CPU: 1 PID: 276 Comm: kworker/u8:3 Not tainted 4.17.0+ #292
> Workqueue: netns cleanup_net
> RIP: 0010:default_device_exit+0x9c/0xb0
> [stack trace snipped]
> 
> This patch gives more variability during choosing new name
> of device and fixes the problem.
> 
> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> ---
> 
> Since there is no suggestions how to fix this in another way, I'm resending the patch.

This patch does not remove the BUG, so does not really solve the
problem. ie., it is fairly trivial to write a script (32k dev%d named
devices in init_net) that triggers it again, so your commit subject and
commit log are not correct with the references to 'fixing the problem'.

The change does provide more variability in naming and reduces the
likelihood of not being able to push a device back to init_net.

^ permalink raw reply

* Re: [PATCH v3 07/14] net: smc911x: remove the dmaengine compat need
From: Robert Jarzmik @ 2018-06-20 17:17 UTC (permalink / raw)
  To: David S. Miller; +Cc: Daniel Mack, linux-kernel, netdev
In-Reply-To: <20180617170217.24177-8-robert.jarzmik@free.fr>

Hi David,

I have converted all the pxa related drivers to DMA slave maps, the only 2 that
remain in my queue are touching your tree, ie. smc911x and smc91x (see [1]).

Once these 2 are done, I have my last dependency on the dma filter function
removed, and can queue the remaining cleanup patches.

Could you (or somebody from netdev) review it and either ack it (and I'll take
it through the pxa tree), or take it for v4.19 please ?

Cheers.

--
Robert

[1] Submission
Robert Jarzmik <robert.jarzmik@free.fr> writes:

> As the pxa architecture switched towards the dmaengine slave map, the
> old compatibility mechanism to acquire the dma requestor line number and
> priority are not needed anymore.
>
> This patch simplifies the dma resource acquisition, using the more
> generic function dma_request_slave_channel().
>
> Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
> ---
> Since v2: converted to NULL filter function and NULL dma parameter call
> ---
>  drivers/net/ethernet/smsc/smc911x.c | 13 ++-----------
>  1 file changed, 2 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/net/ethernet/smsc/smc911x.c b/drivers/net/ethernet/smsc/smc911x.c
> index 05157442a980..b1b53f6c452f 100644
> --- a/drivers/net/ethernet/smsc/smc911x.c
> +++ b/drivers/net/ethernet/smsc/smc911x.c
> @@ -74,7 +74,6 @@ static const char version[] =
>  #include <linux/skbuff.h>
>  
>  #include <linux/dmaengine.h>
> -#include <linux/dma/pxa-dma.h>
>  
>  #include <asm/io.h>
>  
> @@ -1795,7 +1794,6 @@ static int smc911x_probe(struct net_device *dev)
>  #ifdef SMC_USE_DMA
>  	struct dma_slave_config	config;
>  	dma_cap_mask_t mask;
> -	struct pxad_param param;
>  #endif
>  
>  	DBG(SMC_DEBUG_FUNC, dev, "--> %s\n", __func__);
> @@ -1971,15 +1969,8 @@ static int smc911x_probe(struct net_device *dev)
>  
>  	dma_cap_zero(mask);
>  	dma_cap_set(DMA_SLAVE, mask);
> -	param.prio = PXAD_PRIO_LOWEST;
> -	param.drcmr = -1UL;
> -
> -	lp->rxdma =
> -		dma_request_slave_channel_compat(mask, pxad_filter_fn,
> -						 &param, &dev->dev, "rx");
> -	lp->txdma =
> -		dma_request_slave_channel_compat(mask, pxad_filter_fn,
> -						 &param, &dev->dev, "tx");
> +	lp->rxdma = dma_request_channel(mask, NULL, NULL);
> +	lp->txdma = dma_request_channel(mask, NULL, NULL);
>  	lp->rxdma_active = 0;
>  	lp->txdma_active = 0;

-- 
Robert

^ permalink raw reply

* Re: [PATCH v3 08/14] net: smc91x: remove the dmaengine compat need
From: Robert Jarzmik @ 2018-06-20 17:17 UTC (permalink / raw)
  To: David S. Miller; +Cc: Daniel Mack, linux-kernel, netdev
In-Reply-To: <20180617170217.24177-9-robert.jarzmik@free.fr>

Hi David,

I have converted all the pxa related drivers to DMA slave maps, the only 2 that
remain in my queue are touching your tree, ie. smc911x and smc91x (see [1]).

Once these 2 are done, I have my last dependency on the dma filter function
removed, and can queue the remaining cleanup patches.

Could you (or somebody from netdev) review it and either ack it (and I'll take
it through the pxa tree), or take it for v4.19 please ?

Cheers.

--
Robert

[1] Submission
Robert Jarzmik <robert.jarzmik@free.fr> writes:

> As the pxa architecture switched towards the dmaengine slave map, the
> old compatibility mechanism to acquire the dma requestor line number and
> priority are not needed anymore.
>
> This patch simplifies the dma resource acquisition, using the more
> generic function dma_request_slave_channel().
>
> Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
> ---
> Since v2: converted to NULL filter function and NULL dma parameter call
> ---
>  drivers/net/ethernet/smsc/smc91x.c | 9 +--------
>  drivers/net/ethernet/smsc/smc91x.h | 1 -
>  2 files changed, 1 insertion(+), 9 deletions(-)
>
> diff --git a/drivers/net/ethernet/smsc/smc91x.c b/drivers/net/ethernet/smsc/smc91x.c
> index 080428762858..b944828f9ea3 100644
> --- a/drivers/net/ethernet/smsc/smc91x.c
> +++ b/drivers/net/ethernet/smsc/smc91x.c
> @@ -2019,17 +2019,10 @@ static int smc_probe(struct net_device *dev, void __iomem *ioaddr,
>  #  endif
>  	if (lp->cfg.flags & SMC91X_USE_DMA) {
>  		dma_cap_mask_t mask;
> -		struct pxad_param param;
>  
>  		dma_cap_zero(mask);
>  		dma_cap_set(DMA_SLAVE, mask);
> -		param.prio = PXAD_PRIO_LOWEST;
> -		param.drcmr = -1UL;
> -
> -		lp->dma_chan =
> -			dma_request_slave_channel_compat(mask, pxad_filter_fn,
> -							 &param, &dev->dev,
> -							 "data");
> +		lp->dma_chan = dma_request_channel(mask, NULL, NULL);
>  	}
>  #endif
>  
> diff --git a/drivers/net/ethernet/smsc/smc91x.h b/drivers/net/ethernet/smsc/smc91x.h
> index b337ee97e0c0..a27352229fc2 100644
> --- a/drivers/net/ethernet/smsc/smc91x.h
> +++ b/drivers/net/ethernet/smsc/smc91x.h
> @@ -301,7 +301,6 @@ struct smc_local {
>   * as RX which can overrun memory and lose packets.
>   */
>  #include <linux/dma-mapping.h>
> -#include <linux/dma/pxa-dma.h>
>  
>  #ifdef SMC_insl
>  #undef SMC_insl

-- 
Robert

^ permalink raw reply

* Re: [PATCH iproute2-next v3] ip-xfrm: Add support for OUTPUT_MARK
From: David Ahern @ 2018-06-20 17:18 UTC (permalink / raw)
  To: Subash Abhinov Kasiviswanathan, lorenzo, netdev, stephen,
	steffen.klassert
In-Reply-To: <1529116375-7578-1-git-send-email-subashab@codeaurora.org>

On 6/15/18 8:32 PM, Subash Abhinov Kasiviswanathan wrote:
> This patch adds support for OUTPUT_MARK in xfrm state to exercise the
> functionality added by kernel commit 077fbac405bf
> ("net: xfrm: support setting an output mark.").
> 

...

> v1->v2: Moved the XFRMA_OUTPUT_MARK print after XFRMA_MARK in
> xfrm_xfrma_print() as mentioned by Lorenzo
> 
> v2->v3: Fix one help formatting error as mentioned by Lorenzo.
> Keep mark and output-mark on the same line and add man page info as
> mentioned by David.
> 
> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
> ---
>  ip/ipxfrm.c        | 17 ++++++++++++++++-
>  ip/xfrm_state.c    |  9 +++++++++
>  man/man8/ip-xfrm.8 | 11 +++++++++++
>  3 files changed, 36 insertions(+), 1 deletion(-)
> 

applied to iproute2-next. Thanks

^ permalink raw reply

* Re: [PATCH iproute2-next 1/1] tc: jsonify nat action
From: David Ahern @ 2018-06-20 17:24 UTC (permalink / raw)
  To: Keara Leibovitz, dsahern; +Cc: stephen, netdev, kernel
In-Reply-To: <1529348269-30039-1-git-send-email-kleib@mojatatu.com>

On 6/18/18 12:57 PM, Keara Leibovitz wrote:
> Add json output support for nat action
> 

...

> 
> Signed-off-by: Keara Leibovitz <kleib@mojatatu.com>
> ---
>  tc/m_nat.c | 32 +++++++++++++++++++-------------
>  1 file changed, 19 insertions(+), 13 deletions(-)
> 


applied to iproute2-next. Thanks

^ permalink raw reply

* Re: [PATCH net][RESEND] strparser: Don't schedule in workqueue in paused state
From: Dave Watson @ 2018-06-20 17:32 UTC (permalink / raw)
  To: Vakul Garg
  Cc: davem, doronrk, tom, john.fastabend, netdev, ebiggers,
	linux-kernel
In-Reply-To: <20180620215949.32334-1-vakul.garg@nxp.com>

On 06/21/18 03:29 AM, Vakul Garg wrote:
> In function strp_data_ready(), it is useless to call queue_work if
> the state of strparser is already paused. The state checking should
> be done before calling queue_work. The change reduces the context
> switches and improves the ktls-rx throughput by approx 20% (measured
> on cortex-a53 based platform).

Nice find, works for me. Although current code isn't broken, just
slower, net-next would be fine.

Acked-by: Dave Watson <davejwatson@fb.com>

^ permalink raw reply

* [PATCH net] cls_flower: fix use after free in flower S/W path
From: Paolo Abeni @ 2018-06-20 17:34 UTC (permalink / raw)
  To: netdev
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, Marcelo Ricardo Leitner,
	Paul Blakey

If flower filter is created without the skip_sw flag, fl_mask_put()
can race with fl_classify() and we can destroy the mask rhashtable
while a lookup operation is accessing it.

 BUG: unable to handle kernel paging request at 00000000000911d1
 PGD 0 P4D 0
 SMP PTI
 CPU: 3 PID: 5582 Comm: vhost-5541 Not tainted 4.18.0-rc1.vanilla+ #1950
 Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016
 RIP: 0010:rht_bucket_nested+0x20/0x60
 Code: 31 c8 c1 c1 18 29 c8 c3 66 90 8b 4f 04 ba 01 00 00 00 8b 07 48 8b bf 80 00 00 0
 RSP: 0018:ffffafc5cfbb7a48 EFLAGS: 00010206
 RAX: 0000000000001978 RBX: ffff9f12dff88a00 RCX: 00000000ffff9f12
 RDX: 00000000000911d1 RSI: 0000000000000148 RDI: 0000000000000001
 RBP: ffff9f12dff88a00 R08: 000000005f1cc119 R09: 00000000a715fae2
 R10: ffffafc5cfbb7aa8 R11: ffff9f1cb4be804e R12: ffff9f1265e13000
 R13: 0000000000000000 R14: ffffafc5cfbb7b48 R15: ffff9f12dff88b68
 FS:  0000000000000000(0000) GS:ffff9f1d3f0c0000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00000000000911d1 CR3: 0000001575a94006 CR4: 00000000001626e0
 Call Trace:
  fl_lookup+0x134/0x140 [cls_flower]
  fl_classify+0xf3/0x180 [cls_flower]
  tcf_classify+0x78/0x150
  __netif_receive_skb_core+0x69e/0xa50
  netif_receive_skb_internal+0x42/0xf0
  tun_get_user+0xdd5/0xfd0 [tun]
  tun_sendmsg+0x52/0x70 [tun]
  handle_tx+0x2b3/0x5f0 [vhost_net]
  vhost_worker+0xab/0x100 [vhost]
  kthread+0xf8/0x130
  ret_from_fork+0x35/0x40
 Modules linked in: act_mirred act_gact cls_flower vhost_net vhost tap sch_ingress
 CR2: 00000000000911d1

Fix the above waiting for a RCU grace period before destroying the
rhashtable.

Fixes: 05cd271fd61a ("cls_flower: Support multiple masks per priority")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/sched/cls_flower.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 2b5be42a9f1c..5e7333c6472d 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -203,6 +203,17 @@ static int fl_init(struct tcf_proto *tp)
 	return rhashtable_init(&head->ht, &mask_ht_params);
 }
 
+static void fl_mask_free(struct fl_flow_mask *mask)
+{
+	rhashtable_destroy(&mask->ht);
+	kfree(mask);
+}
+
+static void fl_mask_free_rcu(struct rcu_head *entry)
+{
+	fl_mask_free(container_of(entry, struct fl_flow_mask, rcu));
+}
+
 static bool fl_mask_put(struct cls_fl_head *head, struct fl_flow_mask *mask,
 			bool async)
 {
@@ -210,12 +221,11 @@ static bool fl_mask_put(struct cls_fl_head *head, struct fl_flow_mask *mask,
 		return false;
 
 	rhashtable_remove_fast(&head->ht, &mask->ht_node, mask_ht_params);
-	rhashtable_destroy(&mask->ht);
 	list_del_rcu(&mask->list);
 	if (async)
-		kfree_rcu(mask, rcu);
+		call_rcu(&mask->rcu, fl_mask_free_rcu);
 	else
-		kfree(mask);
+		fl_mask_free(mask);
 
 	return true;
 }
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH net] ipvlan: call dev_change_flags when reset ipvlan mode
From: Cong Wang @ 2018-06-20 17:45 UTC (permalink / raw)
  To: David Miller
  Cc: Hangbin Liu, Linux Kernel Network Developers, Stefano Brivio,
	Paolo Abeni, Mahesh Bandewar
In-Reply-To: <20180620.143147.2291173423483856091.davem@davemloft.net>

On Tue, Jun 19, 2018 at 10:31 PM, David Miller <davem@davemloft.net> wrote:
> From: Hangbin Liu <liuhangbin@gmail.com>
> Date: Wed, 20 Jun 2018 11:22:54 +0800
>
>> The only case dev_change_flags() return an err is when we change IFF_UP flag.
>> Since we only set/reset IFF_NOARP, do you think we still need to check the
>> return value?
>
> It is bad to try and take shortcuts on error handling using assumptions
> like that.
>
> If dev_change_flags() is adjusted to return error codes in more
> situations, nobody is going to remember to undo your "optimziation"
> here.
>
> Please check for errors, thank you.

Yeah. Also since the notifier is triggered in this case:

        if (dev->flags & IFF_UP &&
            (changes & ~(IFF_UP | IFF_PROMISC | IFF_ALLMULTI | IFF_VOLATILE))) {
                struct netdev_notifier_change_info change_info = {
                        .info = {
                                .dev = dev,
                        },
                        .flags_changed = changes,
                };

                call_netdevice_notifiers_info(NETDEV_CHANGE, &change_info.info);
        }

the return value of call_netdevice_notifiers_info() isn't captured
either, but it should be.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox