Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] virtio-net: lower min ring num_free for efficiency
From: Jason Wang @ 2019-08-15  3:01 UTC (permalink / raw)
  To: ? jiang, mst@redhat.com
  Cc: davem@davemloft.net, ast@kernel.org, daniel@iogearbox.net,
	jakub.kicinski@netronome.com, hawk@kernel.org,
	john.fastabend@gmail.com, kafai@fb.com, songliubraving@fb.com,
	yhs@fb.com, virtualization@lists.linux-foundation.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	xdp-newbies@vger.kernel.org, bpf@vger.kernel.org,
	jiangran.jr@alibaba-inc.com
In-Reply-To: <BYAPR14MB3205E4E194942B0A1A91A222A6AD0@BYAPR14MB3205.namprd14.prod.outlook.com>


On 2019/8/14 上午10:06, ? jiang wrote:
> This change lowers ring buffer reclaim threshold from 1/2*queue to budget
> for better performance. According to our test with qemu + dpdk, packet
> dropping happens when the guest is not able to provide free buffer in
> avail ring timely with default 1/2*queue. The value in the patch has been
> tested and does show better performance.


Please add your tests setup and result here.

Thanks


>
> Signed-off-by: jiangkidd <jiangkidd@hotmail.com>
> ---
>   drivers/net/virtio_net.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 0d4115c9e20b..bc08be7925eb 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1331,7 +1331,7 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
>   		}
>   	}
>   
> -	if (rq->vq->num_free > virtqueue_get_vring_size(rq->vq) / 2) {
> +	if (rq->vq->num_free > min((unsigned int)budget, virtqueue_get_vring_size(rq->vq)) / 2) {
>   		if (!try_fill_recv(vi, rq, GFP_ATOMIC))
>   			schedule_delayed_work(&vi->refill, 0);
>   	}

^ permalink raw reply

* Re: [PATCH net-next] net/mvpp2: Replace tasklet with softirq hrtimer
From: David Miller @ 2019-08-15  3:02 UTC (permalink / raw)
  To: bigeasy; +Cc: netdev, tglx, maxime.chevallier
In-Reply-To: <20190813080025.2j3cj6gfyzzxek7x@linutronix.de>

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date: Tue, 13 Aug 2019 10:00:25 +0200

> From: Thomas Gleixner <tglx@linutronix.de>
> 
> The tx_done_tasklet tasklet is used in invoke the hrtimer
> (mvpp2_hr_timer_cb) in softirq context. This can be also achieved without
> the tasklet but with HRTIMER_MODE_SOFT as hrtimer mode.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Applied.

^ permalink raw reply

* Re: [PATCH] cxgb4: fix a memory leak bug
From: David Miller @ 2019-08-15  3:03 UTC (permalink / raw)
  To: wenwen; +Cc: vishal, netdev, linux-kernel
In-Reply-To: <1565687932-2870-1-git-send-email-wenwen@cs.uga.edu>

From: Wenwen Wang <wenwen@cs.uga.edu>
Date: Tue, 13 Aug 2019 04:18:52 -0500

> In blocked_fl_write(), 't' is not deallocated if bitmap_parse_user() fails,
> leading to a memory leak bug. To fix this issue, free t before returning
> the error.
> 
> Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>

Applied.

^ permalink raw reply

* Re: [PATCH] virtio-net: lower min ring num_free for efficiency
From: Jason Wang @ 2019-08-15  3:17 UTC (permalink / raw)
  To: 冉 jiang, mst@redhat.com
  Cc: davem@davemloft.net, ast@kernel.org, daniel@iogearbox.net,
	jakub.kicinski@netronome.com, hawk@kernel.org,
	john.fastabend@gmail.com, kafai@fb.com, songliubraving@fb.com,
	yhs@fb.com, virtualization@lists.linux-foundation.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	xdp-newbies@vger.kernel.org, bpf@vger.kernel.org,
	jiangran.jr@alibaba-inc.com
In-Reply-To: <BYAPR14MB3205B734E554EACEEE337ADDA6AC0@BYAPR14MB3205.namprd14.prod.outlook.com>


On 2019/8/15 上午11:11, 冉 jiang wrote:
> On 2019/8/15 11:01, Jason Wang wrote:
>> On 2019/8/14 上午10:06, ? jiang wrote:
>>> This change lowers ring buffer reclaim threshold from 1/2*queue to
>>> budget
>>> for better performance. According to our test with qemu + dpdk, packet
>>> dropping happens when the guest is not able to provide free buffer in
>>> avail ring timely with default 1/2*queue. The value in the patch has
>>> been
>>> tested and does show better performance.
>>
>> Please add your tests setup and result here.
>>
>> Thanks
>>
>>
>>> Signed-off-by: jiangkidd <jiangkidd@hotmail.com>
>>> ---
>>>    drivers/net/virtio_net.c | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index 0d4115c9e20b..bc08be7925eb 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -1331,7 +1331,7 @@ static int virtnet_receive(struct receive_queue
>>> *rq, int budget,
>>>            }
>>>        }
>>>    -    if (rq->vq->num_free > virtqueue_get_vring_size(rq->vq) / 2) {
>>> +    if (rq->vq->num_free > min((unsigned int)budget,
>>> virtqueue_get_vring_size(rq->vq)) / 2) {
>>>            if (!try_fill_recv(vi, rq, GFP_ATOMIC))
>>>                schedule_delayed_work(&vi->refill, 0);
>>>        }
> Sure, here are the details:


Thanks for the details, but I meant it's better if you could summarize 
you test result in the commit log in a compact way.

Btw, some comments, see below:


>
>
> Test setup & result:
>
> ----------------------------------------------------
>
> Below is the snippet from our test result. Test1 was done with default
> driver with the value of 1/2 * queue, while test2 is with my patch. We
> can see average
> drop packets do decrease a lot in test2.
>
> test1Time    avgDropPackets    test2Time    avgDropPackets pps
>
> 16:21.0    12.295    56:50.4    0 300k
> 17:19.1    15.244    56:50.4    0    300k
> 18:17.5    18.789    56:50.4    0    300k
> 19:15.1    14.208    56:50.4    0    300k
> 20:13.2    20.818    56:50.4    0.267    300k
> 21:11.2    12.397    56:50.4    0    300k
> 22:09.3    12.599    56:50.4    0    300k
> 23:07.3    15.531    57:48.4    0    300k
> 24:05.5    13.664    58:46.5    0    300k
> 25:03.7    13.158    59:44.5    4.73    300k
> 26:01.1    2.486    00:42.6    0    300k
> 26:59.1    11.241    01:40.6    0    300k
> 27:57.2    20.521    02:38.6    0    300k
> 28:55.2    30.094    03:36.7    0    300k
> 29:53.3    16.828    04:34.7    0.963    300k
> 30:51.3    46.916    05:32.8    0    400k
> 31:49.3    56.214    05:32.8    0    400k
> 32:47.3    58.69    05:32.8    0    400k
> 33:45.3    61.486    05:32.8    0    400k
> 34:43.3    72.175    05:32.8    0.598    400k
> 35:41.3    56.699    05:32.8    0    400k
> 36:39.3    61.071    05:32.8    0    400k
> 37:37.3    43.355    06:30.8    0    400k
> 38:35.4    44.644    06:30.8    0    400k
> 39:33.4    72.336    06:30.8    0    400k
> 40:31.4    70.676    06:30.8    0    400k
> 41:29.4    108.009    06:30.8    0    400k
> 42:27.4    65.216    06:30.8    0    400k


Why there're difference in test time? Could you summarize them like:

Test setup: e.g testpmd or pktgen to generate packets to guest

avg packets drop before: XXX

avg packets drop after: YYY(-ZZZ%)

Thanks


>
>
> Data to prove why the patch helps:
>
> ----------------------------------------------------
>
> We did have completed several rounds of test with setting the value to
> budget (64 as the default value). It does improve a lot with pps is
> below 400pps for a single stream. We are confident that it runs out of free
> buffer in avail ring when packet dropping happens with below systemtap:
>
> Just a snippet:
>
> probe module("virtio_ring").function("virtqueue_get_buf")
> {
>        x = (@cast($_vq, "vring_virtqueue")->vring->used->idx)-
> (@cast($_vq, "vring_virtqueue")->last_used_idx) ---> we use this one
> to verify if the queue is full, which means guest is not able to take
> buffer from the queue timely
>
>        if (x<0 && (x+65535)<4096)
>            x = x+65535
>
>        if((x==1024) && @cast($_vq, "vring_virtqueue")->vq->callback ==
> callback_addr)
>            netrxcount[x] <<< gettimeofday_s()
> }
>
>
> probe module("virtio_ring").function("virtqueue_add_inbuf")
> {
>        y = (@cast($vq, "vring_virtqueue")->vring->avail->idx)-
> (@cast($vq, "vring_virtqueue")->vring->used->idx) ---> we use this one
> to verify if we run out of free buffer in avail ring
>        if (y<0 && (y+65535)<4096)
>            y = y+65535
>
>        if(@2=="debugon")
>        {
>            if(y==0 && @cast($vq, "vring_virtqueue")->vq->callback ==
> callback_addr)
>            {
>                netrxfreecount[y] <<< gettimeofday_s()
>
>                printf("no avail ring left seen, printing most recent 5
> num free, vq: %lx, current index: %d\n", $vq, recentfreecount)
>                for(i=recentfreecount; i!=((recentfreecount+4) % 5);
> i=((i+1) % 5))
>                {
>                    printf("index: %d, num free: %d\n", i, recentfree[$vq,
> i])
>                }
>
>                printf("index: %d, num free: %d\n", i, recentfree[$vq, i])
>                //exit()
>            }
>        }
> }
>
>
> probe
> module("virtio_net").statement("virtnet_receive@drivers/net/virtio_net.c:732")
>
> {
>        recentfreecount++
>        recentfreecount = recentfreecount % 5
>        recentfree[$rq->vq, recentfreecount] = $rq->vq->num_free --->
> record the num_free for the last 5 calls to virtnet_receive, so we can
> see if lowering the bar helps.
> }
>
>
> Here is the result:
>
> no avail ring left seen, printing most recent 5 num free, vq:
> ffff9c13c1200000, current index: 1
> index: 1, num free: 561
> index: 2, num free: 305
> index: 3, num free: 369
> index: 4, num free: 433
> index: 0, num free: 497
> no avail ring left seen, printing most recent 5 num free, vq:
> ffff9c13c1200000, current index: 1
> index: 1, num free: 543
> index: 2, num free: 463
> index: 3, num free: 469
> index: 4, num free: 476
> index: 0, num free: 479
> no avail ring left seen, printing most recent 5 num free, vq:
> ffff9c13c1200000, current index: 2
> index: 2, num free: 555
> index: 3, num free: 414
> index: 4, num free: 420
> index: 0, num free: 427
> index: 1, num free: 491
>
> We can see in the last 4 calls to virtnet_receive before we run out
> of free buffer and start to relaim, num_free is quite high. So if we
> can do the reclaim earlier, it will certainly help.
>
> Jiang


Right, but I think there's no need to put those thing in the commit log.

Thanks



^ permalink raw reply

* [net] tipc: fix false detection of retransmit failures
From: Tuong Lien @ 2019-08-15  3:24 UTC (permalink / raw)
  To: davem, jon.maloy, maloy, ying.xue, netdev; +Cc: tipc-discussion

This commit eliminates the use of the link 'stale_limit' & 'prev_from'
(besides the already removed - 'stale_cnt') variables in the detection
of repeated retransmit failures as there is no proper way to initialize
them to avoid a false detection, i.e. it is not really a retransmission
failure but due to a garbage values in the variables.

Instead, a jiffies variable will be added to individual skbs (like the
way we restrict the skb retransmissions) in order to mark the first skb
retransmit time. Later on, at the next retransmissions, the timestamp
will be checked to see if the skb in the link transmq is "too stale",
that is, the link tolerance time has passed, so that a link reset will
be ordered. Note, just checking on the first skb in the queue is fine
enough since it must be the oldest one.
A counter is also added to keep track the actual skb retransmissions'
number for later checking when the failure happens.

The downside of this approach is that the skb->cb[] buffer is about to
be exhausted, however it is always able to allocate another memory area
and keep a reference to it when needed.

Fixes: 77cf8edbc0e7 ("tipc: simplify stale link failure criteria")
Reported-by: Hoang Le <hoang.h.le@dektech.com.au>
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
---
 net/tipc/link.c | 92 ++++++++++++++++++++++++++++++++-------------------------
 net/tipc/msg.h  |  8 +++--
 2 files changed, 57 insertions(+), 43 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index dd3155b14654..01d76bf16e9d 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -106,8 +106,6 @@ struct tipc_stats {
  * @transmitq: queue for sent, non-acked messages
  * @backlogq: queue for messages waiting to be sent
  * @snt_nxt: next sequence number to use for outbound messages
- * @prev_from: sequence number of most previous retransmission request
- * @stale_limit: time when repeated identical retransmits must force link reset
  * @ackers: # of peers that needs to ack each packet before it can be released
  * @acked: # last packet acked by a certain peer. Used for broadcast.
  * @rcv_nxt: next sequence number to expect for inbound messages
@@ -164,9 +162,7 @@ struct tipc_link {
 		u16 limit;
 	} backlog[5];
 	u16 snd_nxt;
-	u16 prev_from;
 	u16 window;
-	unsigned long stale_limit;
 
 	/* Reception */
 	u16 rcv_nxt;
@@ -1063,47 +1059,53 @@ static void tipc_link_advance_backlog(struct tipc_link *l,
  * link_retransmit_failure() - Detect repeated retransmit failures
  * @l: tipc link sender
  * @r: tipc link receiver (= l in case of unicast)
- * @from: seqno of the 1st packet in retransmit request
  * @rc: returned code
  *
  * Return: true if the repeated retransmit failures happens, otherwise
  * false
  */
 static bool link_retransmit_failure(struct tipc_link *l, struct tipc_link *r,
-				    u16 from, int *rc)
+				    int *rc)
 {
 	struct sk_buff *skb = skb_peek(&l->transmq);
 	struct tipc_msg *hdr;
 
 	if (!skb)
 		return false;
-	hdr = buf_msg(skb);
 
-	/* Detect repeated retransmit failures on same packet */
-	if (r->prev_from != from) {
-		r->prev_from = from;
-		r->stale_limit = jiffies + msecs_to_jiffies(r->tolerance);
-	} else if (time_after(jiffies, r->stale_limit)) {
-		pr_warn("Retransmission failure on link <%s>\n", l->name);
-		link_print(l, "State of link ");
-		pr_info("Failed msg: usr %u, typ %u, len %u, err %u\n",
-			msg_user(hdr), msg_type(hdr), msg_size(hdr),
-			msg_errcode(hdr));
-		pr_info("sqno %u, prev: %x, src: %x\n",
-			msg_seqno(hdr), msg_prevnode(hdr), msg_orignode(hdr));
-
-		trace_tipc_list_dump(&l->transmq, true, "retrans failure!");
-		trace_tipc_link_dump(l, TIPC_DUMP_NONE, "retrans failure!");
-		trace_tipc_link_dump(r, TIPC_DUMP_NONE, "retrans failure!");
+	if (!TIPC_SKB_CB(skb)->retr_cnt)
+		return false;
 
-		if (link_is_bc_sndlink(l))
-			*rc = TIPC_LINK_DOWN_EVT;
+	if (!time_after(jiffies, TIPC_SKB_CB(skb)->retr_stamp +
+			msecs_to_jiffies(r->tolerance)))
+		return false;
+
+	hdr = buf_msg(skb);
+	if (link_is_bc_sndlink(l) && !less(r->acked, msg_seqno(hdr)))
+		return false;
 
+	pr_warn("Retransmission failure on link <%s>\n", l->name);
+	link_print(l, "State of link ");
+	pr_info("Failed msg: usr %u, typ %u, len %u, err %u\n",
+		msg_user(hdr), msg_type(hdr), msg_size(hdr), msg_errcode(hdr));
+	pr_info("sqno %u, prev: %x, dest: %x\n",
+		msg_seqno(hdr), msg_prevnode(hdr), msg_destnode(hdr));
+	pr_info("retr_stamp %d, retr_cnt %d\n",
+		jiffies_to_msecs(TIPC_SKB_CB(skb)->retr_stamp),
+		TIPC_SKB_CB(skb)->retr_cnt);
+
+	trace_tipc_list_dump(&l->transmq, true, "retrans failure!");
+	trace_tipc_link_dump(l, TIPC_DUMP_NONE, "retrans failure!");
+	trace_tipc_link_dump(r, TIPC_DUMP_NONE, "retrans failure!");
+
+	if (link_is_bc_sndlink(l)) {
+		r->state = LINK_RESET;
+		*rc = TIPC_LINK_DOWN_EVT;
+	} else {
 		*rc = tipc_link_fsm_evt(l, LINK_FAILURE_EVT);
-		return true;
 	}
 
-	return false;
+	return true;
 }
 
 /* tipc_link_bc_retrans() - retransmit zero or more packets
@@ -1129,7 +1131,7 @@ static int tipc_link_bc_retrans(struct tipc_link *l, struct tipc_link *r,
 
 	trace_tipc_link_retrans(r, from, to, &l->transmq);
 
-	if (link_retransmit_failure(l, r, from, &rc))
+	if (link_retransmit_failure(l, r, &rc))
 		return rc;
 
 	skb_queue_walk(&l->transmq, skb) {
@@ -1138,11 +1140,10 @@ static int tipc_link_bc_retrans(struct tipc_link *l, struct tipc_link *r,
 			continue;
 		if (more(msg_seqno(hdr), to))
 			break;
-		if (link_is_bc_sndlink(l)) {
-			if (time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr))
-				continue;
-			TIPC_SKB_CB(skb)->nxt_retr = TIPC_BC_RETR_LIM;
-		}
+
+		if (time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr))
+			continue;
+		TIPC_SKB_CB(skb)->nxt_retr = TIPC_BC_RETR_LIM;
 		_skb = __pskb_copy(skb, LL_MAX_HEADER + MIN_H_SIZE, GFP_ATOMIC);
 		if (!_skb)
 			return 0;
@@ -1152,6 +1153,10 @@ static int tipc_link_bc_retrans(struct tipc_link *l, struct tipc_link *r,
 		_skb->priority = TC_PRIO_CONTROL;
 		__skb_queue_tail(xmitq, _skb);
 		l->stats.retransmitted++;
+
+		/* Increase actual retrans counter & mark first time */
+		if (!TIPC_SKB_CB(skb)->retr_cnt++)
+			TIPC_SKB_CB(skb)->retr_stamp = jiffies;
 	}
 	return 0;
 }
@@ -1393,12 +1398,10 @@ static int tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap,
 	struct tipc_msg *hdr;
 	u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1;
 	u16 ack = l->rcv_nxt - 1;
+	bool passed = false;
 	u16 seqno, n = 0;
 	int rc = 0;
 
-	if (gap && link_retransmit_failure(l, l, acked + 1, &rc))
-		return rc;
-
 	skb_queue_walk_safe(&l->transmq, skb, tmp) {
 		seqno = buf_seqno(skb);
 
@@ -1408,12 +1411,17 @@ static int tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap,
 			__skb_unlink(skb, &l->transmq);
 			kfree_skb(skb);
 		} else if (less_eq(seqno, acked + gap)) {
-			/* retransmit skb */
+			/* First, check if repeated retrans failures occurs? */
+			if (!passed && link_retransmit_failure(l, l, &rc))
+				return rc;
+			passed = true;
+
+			/* retransmit skb if unrestricted*/
 			if (time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr))
 				continue;
 			TIPC_SKB_CB(skb)->nxt_retr = TIPC_UC_RETR_TIME;
-
-			_skb = __pskb_copy(skb, MIN_H_SIZE, GFP_ATOMIC);
+			_skb = __pskb_copy(skb, LL_MAX_HEADER + MIN_H_SIZE,
+					   GFP_ATOMIC);
 			if (!_skb)
 				continue;
 			hdr = buf_msg(_skb);
@@ -1422,6 +1430,10 @@ static int tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap,
 			_skb->priority = TC_PRIO_CONTROL;
 			__skb_queue_tail(xmitq, _skb);
 			l->stats.retransmitted++;
+
+			/* Increase actual retrans counter & mark first time */
+			if (!TIPC_SKB_CB(skb)->retr_cnt++)
+				TIPC_SKB_CB(skb)->retr_stamp = jiffies;
 		} else {
 			/* retry with Gap ACK blocks if any */
 			if (!ga || n >= ga->gack_cnt)
@@ -2681,7 +2693,7 @@ int tipc_link_dump(struct tipc_link *l, u16 dqueues, char *buf)
 	i += scnprintf(buf + i, sz - i, " %x", l->peer_caps);
 	i += scnprintf(buf + i, sz - i, " %u", l->silent_intv_cnt);
 	i += scnprintf(buf + i, sz - i, " %u", l->rst_cnt);
-	i += scnprintf(buf + i, sz - i, " %u", l->prev_from);
+	i += scnprintf(buf + i, sz - i, " %u", 0);
 	i += scnprintf(buf + i, sz - i, " %u", 0);
 	i += scnprintf(buf + i, sz - i, " %u", l->acked);
 
diff --git a/net/tipc/msg.h b/net/tipc/msg.h
index 1c8c8dd32a4e..0daa6f04ca81 100644
--- a/net/tipc/msg.h
+++ b/net/tipc/msg.h
@@ -102,13 +102,15 @@ struct plist;
 #define TIPC_MEDIA_INFO_OFFSET	5
 
 struct tipc_skb_cb {
-	u32 bytes_read;
-	u32 orig_member;
 	struct sk_buff *tail;
 	unsigned long nxt_retr;
-	bool validated;
+	unsigned long retr_stamp;
+	u32 bytes_read;
+	u32 orig_member;
 	u16 chain_imp;
 	u16 ackers;
+	u16 retr_cnt;
+	bool validated;
 };
 
 #define TIPC_SKB_CB(__skb) ((struct tipc_skb_cb *)&((__skb)->cb[0]))
-- 
2.13.7


^ permalink raw reply related

* Re: [PATCH V5 0/9] Fixes for vhost metadata acceleration
From: Jason Wang @ 2019-08-15  3:26 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Michael S. Tsirkin, kvm, virtualization, netdev, linux-kernel,
	linux-mm
In-Reply-To: <20190813115707.GC29508@ziepe.ca>


On 2019/8/13 下午7:57, Jason Gunthorpe wrote:
> On Tue, Aug 13, 2019 at 04:31:07PM +0800, Jason Wang wrote:
>
>> What kind of issues do you see? Spinlock is to synchronize GUP with MMU
>> notifier in this series.
> A GUP that can't sleep can't pagefault which makes it a really weird
> pattern


My understanding is __get_user_pages_fast() assumes caller can fail or 
have fallback. And we have graceful fallback to copy_{to|from}_user().


>
>> Btw, back to the original question. May I know why synchronize_rcu() is not
>> suitable? Consider:
> We already went over this. You'd need to determine it doesn't somehow
> deadlock the mm on reclaim paths. Maybe it is OK, the rcq_gq_wq is
> marked WQ_MEM_RECLAIM at least..


Yes, will take a look at this.


>
> I also think Michael was concerned about the latency spikes a long RCU
> delay would cause.


I don't think it's a real problem consider MMU notifier could be 
preempted or blocked.

Thanks


>
> Jason

^ permalink raw reply

* Re: [PATCH V5 0/9] Fixes for vhost metadata acceleration
From: Jason Wang @ 2019-08-15  3:28 UTC (permalink / raw)
  To: Christoph Hellwig, Jason Gunthorpe
  Cc: Michael S. Tsirkin, kvm, virtualization, netdev, linux-kernel,
	linux-mm
In-Reply-To: <20190813164105.GD22640@infradead.org>


On 2019/8/14 上午12:41, Christoph Hellwig wrote:
> On Tue, Aug 13, 2019 at 08:57:07AM -0300, Jason Gunthorpe wrote:
>> On Tue, Aug 13, 2019 at 04:31:07PM +0800, Jason Wang wrote:
>>
>>> What kind of issues do you see? Spinlock is to synchronize GUP with MMU
>>> notifier in this series.
>> A GUP that can't sleep can't pagefault which makes it a really weird
>> pattern
> get_user_pages/get_user_pages_fast must not be called under a spinlock.
> We have the somewhat misnamed __get_user_page_fast that just does a
> lookup for existing pages and never faults for a few places that need
> to do that lookup from contexts where we can't sleep.


Yes, I do use __get_user_pages_fast() in the code.

Thanks


^ permalink raw reply

* [PATCH bpf-next 0/5] Add support for SKIP_BPF flag for AF_XDP sockets
From: Sridhar Samudrala @ 2019-08-15  3:46 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, netdev, bpf, sridhar.samudrala,
	intel-wired-lan, maciej.fijalkowski, tom.herbert

This patch series introduces XDP_SKIP_BPF flag that can be specified
during the bind() call of an AF_XDP socket to skip calling the BPF 
program in the receive path and pass the buffer directly to the socket.

When a single AF_XDP socket is associated with a queue and a HW
filter is used to redirect the packets and the app is interested in
receiving all the packets on that queue, we don't need an additional 
BPF program to do further filtering or lookup/redirect to a socket.

Here are some performance numbers collected on 
  - 2 socket 28 core Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz
  - Intel 40Gb Ethernet NIC (i40e)

All tests use 2 cores and the results are in Mpps.

turbo on (default)
---------------------------------------------	
                      no-skip-bpf    skip-bpf
---------------------------------------------	
rxdrop zerocopy           21.9         38.5 
l2fwd  zerocopy           17.0         20.5
rxdrop copy               11.1         13.3
l2fwd  copy                1.9          2.0

no turbo :  echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
---------------------------------------------	
                      no-skip-bpf    skip-bpf
---------------------------------------------	
rxdrop zerocopy           15.4         29.0
l2fwd  zerocopy           11.8         18.2
rxdrop copy                8.2         10.5
l2fwd  copy                1.7          1.7
---------------------------------------------	

Sridhar Samudrala (5):
  xsk: Convert bool 'zc' field in struct xdp_umem to a u32 bitmap
  xsk: Introduce XDP_SKIP_BPF bind option
  i40e: Enable XDP_SKIP_BPF option for AF_XDP sockets
  ixgbe: Enable XDP_SKIP_BPF option for AF_XDP sockets
  xdpsock_user: Add skip_bpf option

 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 22 +++++++++-
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    |  6 +++
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 20 ++++++++-
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 16 ++++++-
 include/net/xdp_sock.h                        | 21 ++++++++-
 include/uapi/linux/if_xdp.h                   |  1 +
 include/uapi/linux/xdp_diag.h                 |  1 +
 net/xdp/xdp_umem.c                            |  9 ++--
 net/xdp/xsk.c                                 | 43 ++++++++++++++++---
 net/xdp/xsk_diag.c                            |  5 ++-
 samples/bpf/xdpsock_user.c                    |  8 ++++
 11 files changed, 135 insertions(+), 17 deletions(-)

-- 
2.20.1


^ permalink raw reply

* [PATCH bpf-next 3/5] i40e: Enable XDP_SKIP_BPF option for AF_XDP sockets
From: Sridhar Samudrala @ 2019-08-15  3:46 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, netdev, bpf, sridhar.samudrala,
	intel-wired-lan, maciej.fijalkowski, tom.herbert
In-Reply-To: <1565840783-8269-1-git-send-email-sridhar.samudrala@intel.com>

This patch skips calling BPF program in the receive path if
the queue is associated with UMEM that is not shared and
bound to an AF_XDP socket that has enabled skip bpf during
bind() call.

Here are some performance numbers collected on 
  - 2 socket 28 core Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz
  - Intel 40Gb Ethernet NIC (i40e)

All tests use 2 cores and the results are in Mpps.

turbo on (default)
---------------------------------------------	
                      no-skip-bpf    skip-bpf
---------------------------------------------	
rxdrop zerocopy           21.9         38.5 
l2fwd  zerocopy           17.0         20.5
rxdrop copy               11.1         13.3
l2fwd  copy                1.9          2.0

no turbo :  echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
---------------------------------------------	
                      no-skip-bpf    skip-bpf
---------------------------------------------	
rxdrop zerocopy           15.4         29.0
l2fwd  zerocopy           11.8         18.2
rxdrop copy                8.2         10.5
l2fwd  copy                1.7          1.7
---------------------------------------------	

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 22 +++++++++++++++++++--
 drivers/net/ethernet/intel/i40e/i40e_xsk.c  |  6 ++++++
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index e3f29dc8b290..5e63e3644e87 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2199,6 +2199,7 @@ static struct sk_buff *i40e_run_xdp(struct i40e_ring *rx_ring,
 	int err, result = I40E_XDP_PASS;
 	struct i40e_ring *xdp_ring;
 	struct bpf_prog *xdp_prog;
+	struct xdp_umem *umem;
 	u32 act;
 
 	rcu_read_lock();
@@ -2209,6 +2210,13 @@ static struct sk_buff *i40e_run_xdp(struct i40e_ring *rx_ring,
 
 	prefetchw(xdp->data_hard_start); /* xdp_frame write */
 
+	umem = xdp_get_umem_from_qid(rx_ring->netdev, rx_ring->queue_index);
+	if (xsk_umem_skip_bpf(umem)) {
+		err = xsk_umem_rcv(umem, xdp);
+		result = !err ? I40E_XDP_REDIR : I40E_XDP_CONSUMED;
+		goto xdp_out;
+	}
+
 	act = bpf_prog_run_xdp(xdp_prog, xdp);
 	switch (act) {
 	case XDP_PASS:
@@ -2303,8 +2311,18 @@ void i40e_update_rx_stats(struct i40e_ring *rx_ring,
  **/
 void i40e_finalize_xdp_rx(struct i40e_ring *rx_ring, unsigned int xdp_res)
 {
-	if (xdp_res & I40E_XDP_REDIR)
-		xdp_do_flush_map();
+	if (xdp_res & I40E_XDP_REDIR) {
+		struct xdp_umem *umem;
+
+		umem = rx_ring->xsk_umem;
+		if (!umem)
+			umem = xdp_get_umem_from_qid(rx_ring->netdev,
+						     rx_ring->queue_index);
+		if (xsk_umem_skip_bpf(umem))
+			xsk_umem_flush(umem);
+		else
+			xdp_do_flush_map();
+	}
 
 	if (xdp_res & I40E_XDP_TX) {
 		struct i40e_ring *xdp_ring =
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 32bad014d76c..cc538479c95d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -195,6 +195,12 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 	struct bpf_prog *xdp_prog;
 	u32 act;
 
+	if (xsk_umem_skip_bpf(rx_ring->xsk_umem)) {
+		err = xsk_umem_rcv(rx_ring->xsk_umem, xdp);
+		result = !err ? I40E_XDP_REDIR : I40E_XDP_CONSUMED;
+		return result;
+	}
+
 	rcu_read_lock();
 	/* NB! xdp_prog will always be !NULL, due to the fact that
 	 * this path is enabled by setting an XDP program.
-- 
2.20.1


^ permalink raw reply related

* [PATCH bpf-next 2/5] xsk: Introduce XDP_SKIP_BPF bind option
From: Sridhar Samudrala @ 2019-08-15  3:46 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, netdev, bpf, sridhar.samudrala,
	intel-wired-lan, maciej.fijalkowski, tom.herbert
In-Reply-To: <1565840783-8269-1-git-send-email-sridhar.samudrala@intel.com>

This option enables an AF_XDP socket to specify XDP_SKIP_BPF flag
with the bind() call to skip calling the BPF program in the receive
path and pass the XDP buffer directly to the socket.

When a single AF_XDP socket is associated with a queue and a HW
filter is used to redirect the packets and the app is interested in
receiving all the packets on that queue, we don't need an additional 
BPF program to do further filtering or lookup/redirect to a socket.

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
---
 include/net/xdp_sock.h        |  9 +++++++++
 include/uapi/linux/if_xdp.h   |  1 +
 include/uapi/linux/xdp_diag.h |  1 +
 net/xdp/xdp_umem.c            |  5 ++++-
 net/xdp/xsk.c                 | 31 +++++++++++++++++++++++++++++--
 net/xdp/xsk_diag.c            |  2 ++
 6 files changed, 46 insertions(+), 3 deletions(-)

diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index b6716dbdce1a..ad132a69db7c 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -29,6 +29,7 @@ struct xdp_umem_fq_reuse {
 
 /* Bits for the umem flags field. */
 #define XDP_UMEM_F_ZEROCOPY	(1 << 0)
+#define XDP_UMEM_F_SKIP_BPF	(1 << 1)
 
 struct xdp_umem {
 	struct xsk_queue *fq;
@@ -98,6 +99,9 @@ struct xdp_umem_fq_reuse *xsk_reuseq_swap(struct xdp_umem *umem,
 void xsk_reuseq_free(struct xdp_umem_fq_reuse *rq);
 struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev, u16 queue_id);
 bool xsk_umem_zerocopy(struct xdp_umem *umem);
+bool xsk_umem_skip_bpf(struct xdp_umem *umem);
+void xsk_umem_flush(struct xdp_umem *umem);
+int xsk_umem_rcv(struct xdp_umem *umem, struct xdp_buff *xdp);
 
 static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 addr)
 {
@@ -221,6 +225,11 @@ static inline bool xsk_umem_zerocopy(struct xdp_umem *umem)
 	return false;
 }
 
+static inline bool xsk_umem_skip_bpf(struct xdp_umem *umem)
+{
+	return false;
+}
+
 static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 addr)
 {
 	return NULL;
diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
index faaa5ca2a117..881447ebf3c9 100644
--- a/include/uapi/linux/if_xdp.h
+++ b/include/uapi/linux/if_xdp.h
@@ -16,6 +16,7 @@
 #define XDP_SHARED_UMEM	(1 << 0)
 #define XDP_COPY	(1 << 1) /* Force copy-mode */
 #define XDP_ZEROCOPY	(1 << 2) /* Force zero-copy mode */
+#define XDP_SKIP_BPF	(1 << 3) /* Skip running BPF program */
 
 struct sockaddr_xdp {
 	__u16 sxdp_family;
diff --git a/include/uapi/linux/xdp_diag.h b/include/uapi/linux/xdp_diag.h
index 78b2591a7782..6caf3d9c9abe 100644
--- a/include/uapi/linux/xdp_diag.h
+++ b/include/uapi/linux/xdp_diag.h
@@ -56,6 +56,7 @@ struct xdp_diag_ring {
 };
 
 #define XDP_DU_F_ZEROCOPY (1 << 0)
+#define XDP_DU_F_SKIP_BPF (2 << 0)
 
 struct xdp_diag_umem {
 	__u64	size;
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 411b3e3498c4..cbc02509dc90 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -106,6 +106,9 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
 	umem->dev = dev;
 	umem->queue_id = queue_id;
 
+	if (flags & XDP_SKIP_BPF)
+		umem->flags |= XDP_UMEM_F_SKIP_BPF;
+
 	dev_hold(dev);
 
 	if (force_copy)
@@ -162,7 +165,7 @@ void xdp_umem_clear_dev(struct xdp_umem *umem)
 
 	dev_put(umem->dev);
 	umem->dev = NULL;
-	umem->flags &= ~XDP_UMEM_F_ZEROCOPY;
+	umem->flags &= ~(XDP_UMEM_F_ZEROCOPY | XDP_UMEM_F_SKIP_BPF);
 }
 
 static void xdp_umem_unmap_pages(struct xdp_umem *umem)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index ca95676ef75d..bcb6a77fae22 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -166,6 +166,27 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 	return err;
 }
 
+void xsk_umem_flush(struct xdp_umem *umem)
+{
+	struct xdp_sock *xs;
+
+	if (!list_empty(&umem->xsk_list)) {
+		xs = list_first_entry(&umem->xsk_list, struct xdp_sock, list);
+		xsk_flush(xs);
+	}
+}
+EXPORT_SYMBOL(xsk_umem_flush);
+
+int xsk_umem_rcv(struct xdp_umem *umem, struct xdp_buff *xdp)
+{
+	struct xdp_sock *xs;
+
+	xs = list_first_entry(&umem->xsk_list, struct xdp_sock, list);
+	xdp->handle += xdp->data - xdp->data_hard_start;
+	return xsk_rcv(xs, xdp);
+}
+EXPORT_SYMBOL(xsk_umem_rcv);
+
 void xsk_umem_complete_tx(struct xdp_umem *umem, u32 nb_entries)
 {
 	xskq_produce_flush_addr_n(umem->cq, nb_entries);
@@ -301,6 +322,12 @@ bool xsk_umem_zerocopy(struct xdp_umem *umem)
 }
 EXPORT_SYMBOL(xsk_umem_zerocopy);
 
+bool xsk_umem_skip_bpf(struct xdp_umem *umem)
+{
+	return (umem && (umem->flags & XDP_UMEM_F_SKIP_BPF));
+}
+EXPORT_SYMBOL(xsk_umem_skip_bpf);
+
 static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 {
 	bool need_wait = !(m->msg_flags & MSG_DONTWAIT);
@@ -434,7 +461,7 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 		return -EINVAL;
 
 	flags = sxdp->sxdp_flags;
-	if (flags & ~(XDP_SHARED_UMEM | XDP_COPY | XDP_ZEROCOPY))
+	if (flags & ~(XDP_SHARED_UMEM | XDP_COPY | XDP_ZEROCOPY | XDP_SKIP_BPF))
 		return -EINVAL;
 
 	rtnl_lock();
@@ -461,7 +488,7 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 		struct xdp_sock *umem_xs;
 		struct socket *sock;
 
-		if ((flags & XDP_COPY) || (flags & XDP_ZEROCOPY)) {
+		if (flags & (XDP_COPY | XDP_ZEROCOPY | XDP_SKIP_BPF)) {
 			/* Cannot specify flags for shared sockets. */
 			err = -EINVAL;
 			goto out_unlock;
diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
index 8a19b7e87cfb..f6f4b7912a22 100644
--- a/net/xdp/xsk_diag.c
+++ b/net/xdp/xsk_diag.c
@@ -64,6 +64,8 @@ static int xsk_diag_put_umem(const struct xdp_sock *xs, struct sk_buff *nlskb)
 	du.flags = 0;
 	if (xsk_umem_zerocopy(umem))
 		du.flags |= XDP_DU_F_ZEROCOPY;
+	if (xsk_umem_skip_bpf(umem))
+		du.flags |= XDP_DU_F_SKIP_BPF;
 	du.refs = refcount_read(&umem->users);
 
 	err = nla_put(nlskb, XDP_DIAG_UMEM, sizeof(du), &du);
-- 
2.20.1


^ permalink raw reply related

* [PATCH bpf-next 5/5] xdpsock_user: Add skip_bpf option
From: Sridhar Samudrala @ 2019-08-15  3:46 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, netdev, bpf, sridhar.samudrala,
	intel-wired-lan, maciej.fijalkowski, tom.herbert
In-Reply-To: <1565840783-8269-1-git-send-email-sridhar.samudrala@intel.com>

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
---
 samples/bpf/xdpsock_user.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 93eaaf7239b2..509fc6a18af9 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -123,6 +123,9 @@ static void print_benchmark(bool running)
 	if (opt_poll)
 		printf("poll() ");
 
+	if (opt_xdp_bind_flags & XDP_SKIP_BPF)
+		printf("skip-bpf ");
+
 	if (running) {
 		printf("running...");
 		fflush(stdout);
@@ -352,6 +355,7 @@ static struct option long_options[] = {
 	{"zero-copy", no_argument, 0, 'z'},
 	{"copy", no_argument, 0, 'c'},
 	{"frame-size", required_argument, 0, 'f'},
+	{"skip-bpf", no_argument, 0, 's'},
 	{0, 0, 0, 0}
 };
 
@@ -372,6 +376,7 @@ static void usage(const char *prog)
 		"  -z, --zero-copy      Force zero-copy mode.\n"
 		"  -c, --copy           Force copy mode.\n"
 		"  -f, --frame-size=n   Set the frame size (must be a power of two, default is %d).\n"
+		"  -s, --skip-bpf       Skip running bpf program.\n"
 		"\n";
 	fprintf(stderr, str, prog, XSK_UMEM__DEFAULT_FRAME_SIZE);
 	exit(EXIT_FAILURE);
@@ -430,6 +435,9 @@ static void parse_command_line(int argc, char **argv)
 		case 'f':
 			opt_xsk_frame_size = atoi(optarg);
 			break;
+		case 's':
+			opt_xdp_bind_flags |= XDP_SKIP_BPF;
+			break;
 		default:
 			usage(basename(argv[0]));
 		}
-- 
2.20.1


^ permalink raw reply related

* [PATCH bpf-next 4/5] ixgbe: Enable XDP_SKIP_BPF option for AF_XDP sockets
From: Sridhar Samudrala @ 2019-08-15  3:46 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, netdev, bpf, sridhar.samudrala,
	intel-wired-lan, maciej.fijalkowski, tom.herbert
In-Reply-To: <1565840783-8269-1-git-send-email-sridhar.samudrala@intel.com>

This patch skips calling BPF program in the receive path if
the queue is associated with UMEM that is not shared and
bound to an AF_XDP socket that has enabled skip bpf during
bind() call.

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 20 +++++++++++++++++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 16 +++++++++++++--
 2 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index dc7b128c780e..594792860cdd 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2197,6 +2197,7 @@ static struct sk_buff *ixgbe_run_xdp(struct ixgbe_adapter *adapter,
 	int err, result = IXGBE_XDP_PASS;
 	struct bpf_prog *xdp_prog;
 	struct xdp_frame *xdpf;
+	struct xdp_umem *umem;
 	u32 act;
 
 	rcu_read_lock();
@@ -2207,6 +2208,13 @@ static struct sk_buff *ixgbe_run_xdp(struct ixgbe_adapter *adapter,
 
 	prefetchw(xdp->data_hard_start); /* xdp_frame write */
 
+	umem = xdp_get_umem_from_qid(rx_ring->netdev, rx_ring->queue_index);
+	if (xsk_umem_skip_bpf(umem)) {
+		err = xsk_umem_rcv(umem, xdp);
+		result = !err ? IXGBE_XDP_REDIR : IXGBE_XDP_CONSUMED;
+		goto xdp_out;
+	}
+
 	act = bpf_prog_run_xdp(xdp_prog, xdp);
 	switch (act) {
 	case XDP_PASS:
@@ -2400,8 +2408,16 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 		total_rx_packets++;
 	}
 
-	if (xdp_xmit & IXGBE_XDP_REDIR)
-		xdp_do_flush_map();
+	if (xdp_xmit & IXGBE_XDP_REDIR) {
+		struct xdp_umem *umem;
+
+		umem = xdp_get_umem_from_qid(rx_ring->netdev,
+					     rx_ring->queue_index);
+		if (xsk_umem_skip_bpf(umem))
+			xsk_umem_flush(umem);
+		else
+			xdp_do_flush_map();
+	}
 
 	if (xdp_xmit & IXGBE_XDP_TX) {
 		struct ixgbe_ring *ring = adapter->xdp_ring[smp_processor_id()];
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index 6b609553329f..9ea8a769d7a8 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -148,6 +148,12 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
 	struct xdp_frame *xdpf;
 	u32 act;
 
+	if (xsk_umem_skip_bpf(rx_ring->xsk_umem)) {
+		err = xsk_umem_rcv(rx_ring->xsk_umem, xdp);
+		result = !err ? IXGBE_XDP_REDIR : IXGBE_XDP_CONSUMED;
+		return result;
+	}
+
 	rcu_read_lock();
 	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
 	act = bpf_prog_run_xdp(xdp_prog, xdp);
@@ -527,8 +533,14 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
 		ixgbe_rx_skb(q_vector, skb);
 	}
 
-	if (xdp_xmit & IXGBE_XDP_REDIR)
-		xdp_do_flush_map();
+	if (xdp_xmit & IXGBE_XDP_REDIR) {
+		struct xdp_umem *umem = rx_ring->xsk_umem;
+
+		if (xsk_umem_skip_bpf(umem))
+			xsk_umem_flush(umem);
+		else
+			xdp_do_flush_map();
+	}
 
 	if (xdp_xmit & IXGBE_XDP_TX) {
 		struct ixgbe_ring *ring = adapter->xdp_ring[smp_processor_id()];
-- 
2.20.1


^ permalink raw reply related

* [PATCH bpf-next 1/5] xsk: Convert bool 'zc' field in struct xdp_umem to a u32 bitmap
From: Sridhar Samudrala @ 2019-08-15  3:46 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, netdev, bpf, sridhar.samudrala,
	intel-wired-lan, maciej.fijalkowski, tom.herbert
In-Reply-To: <1565840783-8269-1-git-send-email-sridhar.samudrala@intel.com>

The bool 'zc' field in struct xdp_uem is replaced with a u32 flags
field and a bit within flags is used to indicate zerocopy. This flags
field will be used in later patches for other bit fields.

Also, removed the bool 'zc' field from struct xdp_sock as it can be
accessed via flags in xs->umem.

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
---
 include/net/xdp_sock.h | 12 ++++++++++--
 net/xdp/xdp_umem.c     |  6 +++---
 net/xdp/xsk.c          | 12 +++++++++---
 net/xdp/xsk_diag.c     |  3 ++-
 4 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index 69796d264f06..b6716dbdce1a 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -27,6 +27,9 @@ struct xdp_umem_fq_reuse {
 	u64 handles[];
 };
 
+/* Bits for the umem flags field. */
+#define XDP_UMEM_F_ZEROCOPY	(1 << 0)
+
 struct xdp_umem {
 	struct xsk_queue *fq;
 	struct xsk_queue *cq;
@@ -45,7 +48,7 @@ struct xdp_umem {
 	struct net_device *dev;
 	struct xdp_umem_fq_reuse *fq_reuse;
 	u16 queue_id;
-	bool zc;
+	u32 flags;
 	spinlock_t xsk_list_lock;
 	struct list_head xsk_list;
 };
@@ -58,7 +61,6 @@ struct xdp_sock {
 	struct xdp_umem *umem;
 	struct list_head flush_node;
 	u16 queue_id;
-	bool zc;
 	enum {
 		XSK_READY = 0,
 		XSK_BOUND,
@@ -95,6 +97,7 @@ struct xdp_umem_fq_reuse *xsk_reuseq_swap(struct xdp_umem *umem,
 					  struct xdp_umem_fq_reuse *newq);
 void xsk_reuseq_free(struct xdp_umem_fq_reuse *rq);
 struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev, u16 queue_id);
+bool xsk_umem_zerocopy(struct xdp_umem *umem);
 
 static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 addr)
 {
@@ -213,6 +216,11 @@ static inline struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev,
 	return NULL;
 }
 
+static inline bool xsk_umem_zerocopy(struct xdp_umem *umem)
+{
+	return false;
+}
+
 static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 addr)
 {
 	return NULL;
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index a0607969f8c0..411b3e3498c4 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -126,7 +126,7 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
 	if (err)
 		goto err_unreg_umem;
 
-	umem->zc = true;
+	umem->flags |= XDP_UMEM_F_ZEROCOPY;
 	return 0;
 
 err_unreg_umem:
@@ -147,7 +147,7 @@ void xdp_umem_clear_dev(struct xdp_umem *umem)
 	if (!umem->dev)
 		return;
 
-	if (umem->zc) {
+	if (xsk_umem_zerocopy(umem)) {
 		bpf.command = XDP_SETUP_XSK_UMEM;
 		bpf.xsk.umem = NULL;
 		bpf.xsk.queue_id = umem->queue_id;
@@ -162,7 +162,7 @@ void xdp_umem_clear_dev(struct xdp_umem *umem)
 
 	dev_put(umem->dev);
 	umem->dev = NULL;
-	umem->zc = false;
+	umem->flags &= ~XDP_UMEM_F_ZEROCOPY;
 }
 
 static void xdp_umem_unmap_pages(struct xdp_umem *umem)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 59b57d708697..ca95676ef75d 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -295,6 +295,12 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
 	return err;
 }
 
+bool xsk_umem_zerocopy(struct xdp_umem *umem)
+{
+	return (umem && (umem->flags & XDP_UMEM_F_ZEROCOPY));
+}
+EXPORT_SYMBOL(xsk_umem_zerocopy);
+
 static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 {
 	bool need_wait = !(m->msg_flags & MSG_DONTWAIT);
@@ -310,7 +316,8 @@ static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 	if (need_wait)
 		return -EOPNOTSUPP;
 
-	return (xs->zc) ? xsk_zc_xmit(sk) : xsk_generic_xmit(sk, m, total_len);
+	return xsk_umem_zerocopy(xs->umem) ? xsk_zc_xmit(sk) :
+					     xsk_generic_xmit(sk, m, total_len);
 }
 
 static unsigned int xsk_poll(struct file *file, struct socket *sock,
@@ -503,7 +510,6 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 	}
 
 	xs->dev = dev;
-	xs->zc = xs->umem->zc;
 	xs->queue_id = qid;
 	xskq_set_umem(xs->rx, xs->umem->size, xs->umem->chunk_mask);
 	xskq_set_umem(xs->tx, xs->umem->size, xs->umem->chunk_mask);
@@ -683,7 +689,7 @@ static int xsk_getsockopt(struct socket *sock, int level, int optname,
 			return -EINVAL;
 
 		mutex_lock(&xs->mutex);
-		if (xs->zc)
+		if (xsk_umem_zerocopy(xs->umem))
 			opts.flags |= XDP_OPTIONS_ZEROCOPY;
 		mutex_unlock(&xs->mutex);
 
diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
index d5e06c8e0cbf..8a19b7e87cfb 100644
--- a/net/xdp/xsk_diag.c
+++ b/net/xdp/xsk_diag.c
@@ -10,6 +10,7 @@
 #include <net/xdp_sock.h>
 #include <linux/xdp_diag.h>
 #include <linux/sock_diag.h>
+#include <net/xdp_sock.h>
 
 #include "xsk_queue.h"
 #include "xsk.h"
@@ -61,7 +62,7 @@ static int xsk_diag_put_umem(const struct xdp_sock *xs, struct sk_buff *nlskb)
 	du.ifindex = umem->dev ? umem->dev->ifindex : 0;
 	du.queue_id = umem->queue_id;
 	du.flags = 0;
-	if (umem->zc)
+	if (xsk_umem_zerocopy(umem))
 		du.flags |= XDP_DU_F_ZEROCOPY;
 	du.refs = refcount_read(&umem->users);
 
-- 
2.20.1


^ permalink raw reply related

* Re: tun: mark small packets as owned by the tap sock
From: Jason Wang @ 2019-08-15  5:11 UTC (permalink / raw)
  To: Dave Jones; +Cc: Alexis Bauvin, netdev
In-Reply-To: <20190813140025.GA17823@codemonkey.org.uk>


On 2019/8/13 下午10:00, Dave Jones wrote:
> On Tue, Aug 13, 2019 at 04:33:59PM +0800, Jason Wang wrote:
>   >
>   > On 2019/8/13 上午6:19, Dave Jones wrote:
>   > > On Wed, Aug 07, 2019 at 12:30:07AM +0000, Linux Kernel wrote:
>   > >   > Commit:     4b663366246be1d1d4b1b8b01245b2e88ad9e706
>   > >   > Parent:     16b2084a8afa1432d14ba72b7c97d7908e178178
>   > >   > Web:        https://git.kernel.org/torvalds/c/4b663366246be1d1d4b1b8b01245b2e88ad9e706
>   > >   > Author:     Alexis Bauvin <abauvin@scaleway.com>
>   > >   > AuthorDate: Tue Jul 23 16:23:01 2019 +0200
>   > >   >
>   > >   >     tun: mark small packets as owned by the tap sock
>   > >   >
>   > >   >     - v1 -> v2: Move skb_set_owner_w to __tun_build_skb to reduce patch size
>   > >
>   > > This commit breaks ipv6 routing when I deployed on it a linode.
>   > > It seems to work briefly after boot, and then silently all packets get
>   > > dropped. (Presumably, it's dropping RA or ND packets)
>   > >
>   > > With this reverted, everything works as it did in rc3.
>   > >
>   > Two questions:
>   >
>   > - Are you using XDP for TUN?
>
> not knowingly.
> $ grep XDP .config
> # CONFIG_XDP_SOCKETS is not set
>
> What's configured on the hypervisor side I have no idea.


Ok, please tell me more about your setups:

- Are you using TUN in host or guest?

- Are you using it for VM or VPN(tunneling)?

- Where did the packet get dropped?


>
>   > - Does it work before 66ccbc9c87c2?
>
> that's been around since 4.14-rc1, and at one point it ran whatever was
> in debian9 (4.9).  I don't recall it ever not working, so I'd say yes.
>
> I can build a 4.13 if it'll prove something, but it'll take me a while.
> (This is my primary MX, so it's dropping email while it's on the broken
>   kernel, so I need to plan some time to be around to babysit it)


If possible please try that.


>
>   > If yes, could you show us the result of net_dropmonitor?
>
> where do I get that?  It doesn't seem packaged for debian.
>
> 	Dave


It's part of perf-script(1). You can simply start it through perf script 
record net_dropmonitor.

Thanks

>

^ permalink raw reply

* [PATCH] Bluetooth: 6lowpan: Make variable header_ops constant
From: Nishka Dasgupta @ 2019-08-15  5:52 UTC (permalink / raw)
  To: marcel, johan.hedberg, linux-bluetooth, linux-kernel, davem,
	netdev
  Cc: Nishka Dasgupta

Static variable header_ops, of type header_ops, is used only once, when
it is assigned to field header_ops of a variable having type net_device.
This corresponding field is declared as const in the definition of
net_device. Hence make header_ops constant as well to protect it from
unnecessary modification.
Issue found with Coccinelle.

Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com>
---
 net/bluetooth/6lowpan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/bluetooth/6lowpan.c b/net/bluetooth/6lowpan.c
index 9d41de1ec90f..bb55d92691b0 100644
--- a/net/bluetooth/6lowpan.c
+++ b/net/bluetooth/6lowpan.c
@@ -583,7 +583,7 @@ static const struct net_device_ops netdev_ops = {
 	.ndo_start_xmit		= bt_xmit,
 };
 
-static struct header_ops header_ops = {
+static const struct header_ops header_ops = {
 	.create	= header_create,
 };
 
-- 
2.19.1


^ permalink raw reply related

* Re: [RFC PATCH 1/5] x86: tsc: add tsc to art helpers
From: Felipe Balbi @ 2019-08-15  5:57 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Richard Cochran, netdev, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, x86, linux-kernel, Christopher S . Hall
In-Reply-To: <alpine.DEB.2.21.1907160952040.1767@nanos.tec.linutronix.de>


Hi,


Thomas Gleixner <tglx@linutronix.de> writes:

> Felipe,
>
> On Tue, 16 Jul 2019, Felipe Balbi wrote:
>
> -ENOCHANGELOG
>
> As you said in the cover letter:
>
>>  (3) The change in arch/x86/kernel/tsc.c needs to be reviewed at length
>>      before going in.
>
> So some information what those interfaces are used for and why they are
> needed would be really helpful.

Okay, I have some more details about this. The TGPIO device itself uses
ART since TSC is not directly available to anything other than the
CPU. The 'problem' here is that reading ART incurs extra latency which
we would like to avoid. Therefore, we use TSC and scale it to
nanoseconds which, would be the same as ART to ns.

>> +void get_tsc_ns(struct system_counterval_t *tsc_counterval, u64 *tsc_ns)
>> +{
>> +	u64 tmp, res, rem;
>> +	u64 cycles;
>> +
>> +	tsc_counterval->cycles = clocksource_tsc.read(NULL);
>> +	cycles = tsc_counterval->cycles;
>> +	tsc_counterval->cs = art_related_clocksource;
>> +
>> +	rem = do_div(cycles, tsc_khz);
>> +
>> +	res = cycles * USEC_PER_SEC;
>> +	tmp = rem * USEC_PER_SEC;
>> +
>> +	do_div(tmp, tsc_khz);
>> +	res += tmp;
>> +
>> +	*tsc_ns = res;
>> +}
>> +EXPORT_SYMBOL(get_tsc_ns);
>> +
>> +u64 get_art_ns_now(void)
>> +{
>> +	struct system_counterval_t tsc_cycles;
>> +	u64 tsc_ns;
>> +
>> +	get_tsc_ns(&tsc_cycles, &tsc_ns);
>> +
>> +	return tsc_ns;
>> +}
>> +EXPORT_SYMBOL(get_art_ns_now);
>
> While the changes look innocuous I'm missing the big picture why this needs
> to emulate ART instead of simply using TSC directly.

i don't think we're emulating ART here (other than the name in the
function). We're just reading TSC and converting to nanoseconds, right?

Cheers

-- 
balbi

^ permalink raw reply

* Re: [net-next  1/1] tipc: clean up skb list lock handling on send path
From: Xin Long @ 2019-08-15  5:57 UTC (permalink / raw)
  To: Jon Maloy
  Cc: davem, netdev, tung q nguyen, hoang h le, shuali, ying xue,
	edumazet, tipc-discussion
In-Reply-To: <1565794548-15425-1-git-send-email-jon.maloy@ericsson.com>



----- Original Message -----
> The policy for handling the skb list locks on the send and receive paths
> is simple.
> 
> - On the send path we never need to grab the lock on the 'xmitq' list
>   when the destination is an exernal node.
> 
> - On the receive path we always need to grab the lock on the 'inputq'
>   list, irrespective of source node.
> 
> However, when transmitting node local messages those will eventually
> end up on the receive path of a local socket, meaning that the argument
> 'xmitq' in tipc_node_xmit() will become the 'ínputq' argument in  the
> function tipc_sk_rcv(). This has been handled by always initializing
> the spinlock of the 'xmitq' list at message creation, just in case it
> may end up on the receive path later, and despite knowing that the lock
> in most cases never will be used.
> 
> This approach is inaccurate and confusing, and has also concealed the
> fact that the stated 'no lock grabbing' policy for the send path is
> violated in some cases.
> 
> We now clean up this by never initializing the lock at message creation,
> instead doing this at the moment we find that the message actually will
> enter the receive path. At the same time we fix the four locations
> where we incorrectly access the spinlock on the send/error path.
> 
> This patch also reverts commit d12cffe9329f ("tipc: ensure head->lock
> is initialised") which has now become redundant.
> 
> CC: Eric Dumazet <edumazet@google.com>
> Reported-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
> Acked-by: Ying Xue <ying.xue@windriver.com>
> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
> ---
>  net/tipc/bcast.c      | 10 +++++-----
>  net/tipc/link.c       |  4 ++--
>  net/tipc/name_distr.c |  2 +-
>  net/tipc/node.c       |  7 ++++---
>  net/tipc/socket.c     | 14 +++++++-------
>  5 files changed, 19 insertions(+), 18 deletions(-)
> 
> diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
> index 34f3e56..6ef1abd 100644
> --- a/net/tipc/bcast.c
> +++ b/net/tipc/bcast.c
> @@ -185,7 +185,7 @@ static void tipc_bcbase_xmit(struct net *net, struct
> sk_buff_head *xmitq)
>  	}
>  
>  	/* We have to transmit across all bearers */
> -	skb_queue_head_init(&_xmitq);
> +	__skb_queue_head_init(&_xmitq);
>  	for (bearer_id = 0; bearer_id < MAX_BEARERS; bearer_id++) {
>  		if (!bb->dests[bearer_id])
>  			continue;
> @@ -256,7 +256,7 @@ static int tipc_bcast_xmit(struct net *net, struct
> sk_buff_head *pkts,
>  	struct sk_buff_head xmitq;
>  	int rc = 0;
>  
> -	skb_queue_head_init(&xmitq);
> +	__skb_queue_head_init(&xmitq);
>  	tipc_bcast_lock(net);
>  	if (tipc_link_bc_peers(l))
>  		rc = tipc_link_xmit(l, pkts, &xmitq);
> @@ -286,7 +286,7 @@ static int tipc_rcast_xmit(struct net *net, struct
> sk_buff_head *pkts,
>  	u32 dnode, selector;
>  
>  	selector = msg_link_selector(buf_msg(skb_peek(pkts)));
> -	skb_queue_head_init(&_pkts);
> +	__skb_queue_head_init(&_pkts);
>  
>  	list_for_each_entry_safe(dst, tmp, &dests->list, list) {
>  		dnode = dst->node;
> @@ -344,7 +344,7 @@ static int tipc_mcast_send_sync(struct net *net, struct
> sk_buff *skb,
>  	msg_set_size(_hdr, MCAST_H_SIZE);
>  	msg_set_is_rcast(_hdr, !msg_is_rcast(hdr));
>  
> -	skb_queue_head_init(&tmpq);
> +	__skb_queue_head_init(&tmpq);
>  	__skb_queue_tail(&tmpq, _skb);
>  	if (method->rcast)
>  		tipc_bcast_xmit(net, &tmpq, cong_link_cnt);
> @@ -378,7 +378,7 @@ int tipc_mcast_xmit(struct net *net, struct sk_buff_head
> *pkts,
>  	int rc = 0;
>  
>  	skb_queue_head_init(&inputq);
> -	skb_queue_head_init(&localq);
> +	__skb_queue_head_init(&localq);
>  
>  	/* Clone packets before they are consumed by next call */
>  	if (dests->local && !tipc_msg_reassemble(pkts, &localq)) {
> diff --git a/net/tipc/link.c b/net/tipc/link.c
> index dd3155b..ba057a9 100644
> --- a/net/tipc/link.c
> +++ b/net/tipc/link.c
> @@ -959,7 +959,7 @@ int tipc_link_xmit(struct tipc_link *l, struct
> sk_buff_head *list,
>  		pr_warn("Too large msg, purging xmit list %d %d %d %d %d!\n",
>  			skb_queue_len(list), msg_user(hdr),
>  			msg_type(hdr), msg_size(hdr), mtu);
> -		skb_queue_purge(list);
> +		__skb_queue_purge(list);
>  		return -EMSGSIZE;
>  	}
>  
> @@ -988,7 +988,7 @@ int tipc_link_xmit(struct tipc_link *l, struct
> sk_buff_head *list,
>  		if (likely(skb_queue_len(transmq) < maxwin)) {
>  			_skb = skb_clone(skb, GFP_ATOMIC);
>  			if (!_skb) {
> -				skb_queue_purge(list);
> +				__skb_queue_purge(list);
>  				return -ENOBUFS;
>  			}
>  			__skb_dequeue(list);
> diff --git a/net/tipc/name_distr.c b/net/tipc/name_distr.c
> index 44abc8e..61219f0 100644
> --- a/net/tipc/name_distr.c
> +++ b/net/tipc/name_distr.c
> @@ -190,7 +190,7 @@ void tipc_named_node_up(struct net *net, u32 dnode)
>  	struct name_table *nt = tipc_name_table(net);
>  	struct sk_buff_head head;
>  
> -	skb_queue_head_init(&head);
> +	__skb_queue_head_init(&head);
>  
>  	read_lock_bh(&nt->cluster_scope_lock);
>  	named_distribute(net, &head, dnode, &nt->cluster_scope);
> diff --git a/net/tipc/node.c b/net/tipc/node.c
> index 1bdcf0f..c8f6177 100644
> --- a/net/tipc/node.c
> +++ b/net/tipc/node.c
> @@ -1444,13 +1444,14 @@ int tipc_node_xmit(struct net *net, struct
> sk_buff_head *list,
>  
>  	if (in_own_node(net, dnode)) {
>  		tipc_loopback_trace(net, list);
> +		spin_lock_init(&list->lock);
>  		tipc_sk_rcv(net, list);
>  		return 0;
>  	}
>  
>  	n = tipc_node_find(net, dnode);
>  	if (unlikely(!n)) {
> -		skb_queue_purge(list);
> +		__skb_queue_purge(list);
>  		return -EHOSTUNREACH;
>  	}
>  
> @@ -1459,7 +1460,7 @@ int tipc_node_xmit(struct net *net, struct sk_buff_head
> *list,
>  	if (unlikely(bearer_id == INVALID_BEARER_ID)) {
>  		tipc_node_read_unlock(n);
>  		tipc_node_put(n);
> -		skb_queue_purge(list);
> +		__skb_queue_purge(list);
>  		return -EHOSTUNREACH;
>  	}
>  
> @@ -1491,7 +1492,7 @@ int tipc_node_xmit_skb(struct net *net, struct sk_buff
> *skb, u32 dnode,
>  {
>  	struct sk_buff_head head;
>  
> -	skb_queue_head_init(&head);
> +	__skb_queue_head_init(&head);
>  	__skb_queue_tail(&head, skb);
>  	tipc_node_xmit(net, &head, dnode, selector);
>  	return 0;
> diff --git a/net/tipc/socket.c b/net/tipc/socket.c
> index 83ae41d..3b9f8cc 100644
> --- a/net/tipc/socket.c
> +++ b/net/tipc/socket.c
> @@ -809,7 +809,7 @@ static int tipc_sendmcast(struct  socket *sock, struct
> tipc_name_seq *seq,
>  	msg_set_nameupper(hdr, seq->upper);
>  
>  	/* Build message as chain of buffers */
> -	skb_queue_head_init(&pkts);
> +	__skb_queue_head_init(&pkts);
>  	rc = tipc_msg_build(hdr, msg, 0, dlen, mtu, &pkts);
>  
>  	/* Send message if build was successful */
> @@ -853,7 +853,7 @@ static int tipc_send_group_msg(struct net *net, struct
> tipc_sock *tsk,
>  	msg_set_grp_bc_seqno(hdr, bc_snd_nxt);
>  
>  	/* Build message as chain of buffers */
> -	skb_queue_head_init(&pkts);
> +	__skb_queue_head_init(&pkts);
>  	mtu = tipc_node_get_mtu(net, dnode, tsk->portid);
>  	rc = tipc_msg_build(hdr, m, 0, dlen, mtu, &pkts);
>  	if (unlikely(rc != dlen))
> @@ -1058,7 +1058,7 @@ static int tipc_send_group_bcast(struct socket *sock,
> struct msghdr *m,
>  	msg_set_grp_bc_ack_req(hdr, ack);
>  
>  	/* Build message as chain of buffers */
> -	skb_queue_head_init(&pkts);
> +	__skb_queue_head_init(&pkts);
>  	rc = tipc_msg_build(hdr, m, 0, dlen, mtu, &pkts);
>  	if (unlikely(rc != dlen))
>  		return rc;
> @@ -1387,7 +1387,7 @@ static int __tipc_sendmsg(struct socket *sock, struct
> msghdr *m, size_t dlen)
>  	if (unlikely(rc))
>  		return rc;
>  
> -	skb_queue_head_init(&pkts);
> +	__skb_queue_head_init(&pkts);
>  	mtu = tipc_node_get_mtu(net, dnode, tsk->portid);
>  	rc = tipc_msg_build(hdr, m, 0, dlen, mtu, &pkts);
>  	if (unlikely(rc != dlen))
> @@ -1445,7 +1445,7 @@ static int __tipc_sendstream(struct socket *sock,
> struct msghdr *m, size_t dlen)
>  	int send, sent = 0;
>  	int rc = 0;
>  
> -	skb_queue_head_init(&pkts);
> +	__skb_queue_head_init(&pkts);
>  
>  	if (unlikely(dlen > INT_MAX))
>  		return -EMSGSIZE;
> @@ -1805,7 +1805,7 @@ static int tipc_recvmsg(struct socket *sock, struct
> msghdr *m,
>  
>  	/* Send group flow control advertisement when applicable */
>  	if (tsk->group && msg_in_group(hdr) && !grp_evt) {
> -		skb_queue_head_init(&xmitq);
> +		__skb_queue_head_init(&xmitq);
>  		tipc_group_update_rcv_win(tsk->group, tsk_blocks(hlen + dlen),
>  					  msg_orignode(hdr), msg_origport(hdr),
>  					  &xmitq);
> @@ -2674,7 +2674,7 @@ static void tipc_sk_timeout(struct timer_list *t)
>  	struct sk_buff_head list;
>  	int rc = 0;
>  
> -	skb_queue_head_init(&list);
> +	__skb_queue_head_init(&list);
>  	bh_lock_sock(sk);
>  
>  	/* Try again later if socket is busy */
> --
> 2.1.4
> 
> 
Patch looks good, can you also check those tmp tx queues in:

  tipc_group_cong()
  tipc_group_join()
  tipc_link_create_dummy_tnl_msg()
  tipc_link_tnl_prepare()

which are using skb_queue_head_init() to init?

Thanks.

^ permalink raw reply

* Re: [patch net-next v2 2/2] selftests: netdevsim: add devlink params tests
From: Jiri Pirko @ 2019-08-15  5:58 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, davem, mlxsw
In-Reply-To: <20190814180900.71712d88@cakuba.netronome.com>

Thu, Aug 15, 2019 at 03:09:00AM CEST, jakub.kicinski@netronome.com wrote:
>On Wed, 14 Aug 2019 17:26:04 +0200, Jiri Pirko wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>> 
>> Test recently added netdevsim devlink param implementation.
>> 
>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>> ---
>> v1->v2:
>> -using cmd_jq helper
>
>Still failing here :(

Ugh :/

>
># ./devlink.sh 
>TEST: fw flash test                                                 [ OK ]
>TEST: params test                                                   [FAIL]
>	Failed to get test1 param value
>TEST: regions test                                                  [ OK ]
>
># jq --version
>jq-1.5-1-a5b5cbe
># echo '{ "a" : false }' | jq -e -r '.[]'
>false
># echo $?
>1
>
>On another machine:
>
>$ echo '{ "a" : false }' | jq -e -r '.[]'
>false
>$ echo $?
>1
>
>Did you mean to drop the -e ?

No. -e is needed in order to jq return error in case there is no output.
Looks like a bug in jq 1.6 fixed. How about I add a check for jq >= 1.6?

^ permalink raw reply

* [PATCH net] tunnel: fix dev null pointer dereference when send pkg larger than mtu in collect_md mode
From: Hangbin Liu @ 2019-08-15  6:09 UTC (permalink / raw)
  To: netdev
  Cc: Stefano Brivio, wenxu, Alexei Starovoitov, David S . Miller,
	Hangbin Liu

When we send a packet larger than PMTU, we need to reply with
icmp_send(ICMP_FRAG_NEEDED) or icmpv6_send(ICMPV6_PKT_TOOBIG).

But in collect_md mode, kernel will crash while accessing the dst dev
as __metadata_dst_init() init dst->dev to NULL by default. Here is what
the code path looks like, for GRE:

- ip6gre_tunnel_xmit
  - ip6gre_xmit_ipv4
    - __gre6_xmit
      - ip6_tnl_xmit
        - if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE
    - icmp_send
      - net = dev_net(rt->dst.dev); <-- here
  - ip6gre_xmit_ipv6
    - __gre6_xmit
      - ip6_tnl_xmit
        - if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE
    - icmpv6_send
      ...
      - decode_session4
        - oif = skb_dst(skb)->dev->ifindex; <-- here
      - decode_session6
        - oif = skb_dst(skb)->dev->ifindex; <-- here

Fix it by updating the dst dev if not set.

The reproducer is easy:

ovs-vsctl add-br br0
ip link set br0 up
ovs-vsctl add-port br0 gre0 -- \
	  set interface gre0 type=gre options:remote_ip=$dst_addr
ip link set gre0 up
ip addr add ${local_gre6}/64 dev br0
ping6 $remote_gre6 -s 1500

The kernel will crash like
[40595.821651] BUG: kernel NULL pointer dereference, address: 0000000000000108
[40595.822411] #PF: supervisor read access in kernel mode
[40595.822949] #PF: error_code(0x0000) - not-present page
[40595.823492] PGD 0 P4D 0
[40595.823767] Oops: 0000 [#1] SMP PTI
[40595.824139] CPU: 0 PID: 2831 Comm: handler12 Not tainted 5.2.0 #57
[40595.824788] Hardware name: Red Hat KVM, BIOS 1.11.1-3.module+el8.1.0+2983+b2ae9c0a 04/01/2014
[40595.825680] RIP: 0010:__xfrm_decode_session+0x6b/0x930
[40595.826219] Code: b7 c0 00 00 00 b8 06 00 00 00 66 85 d2 0f b7 ca 48 0f 45 c1 44 0f b6 2c 06 48 8b 47 58 48 83 e0 fe 0f 84 f4 04 00 00 48 8b 00 <44> 8b 80 08 01 00 00 41 f6 c4 01 4c 89 e7
ba 58 00 00 00 0f 85 47
[40595.828155] RSP: 0018:ffffc90000a73438 EFLAGS: 00010286
[40595.828705] RAX: 0000000000000000 RBX: ffff8881329d7100 RCX: 0000000000000000
[40595.829450] RDX: 0000000000000000 RSI: ffff8881339e70ce RDI: ffff8881329d7100
[40595.830191] RBP: ffffc90000a73470 R08: 0000000000000000 R09: 000000000000000a
[40595.830936] R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90000a73490
[40595.831682] R13: 000000000000002c R14: ffff888132ff1301 R15: ffff8881329d7100
[40595.832427] FS:  00007f5bfcfd6700(0000) GS:ffff88813ba00000(0000) knlGS:0000000000000000
[40595.833266] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[40595.833883] CR2: 0000000000000108 CR3: 000000013a368000 CR4: 00000000000006f0
[40595.834633] Call Trace:
[40595.835392]  ? rt6_multipath_hash+0x4c/0x390
[40595.835853]  icmpv6_route_lookup+0xcb/0x1d0
[40595.836296]  ? icmpv6_xrlim_allow+0x3e/0x140
[40595.836751]  icmp6_send+0x537/0x840
[40595.837125]  icmpv6_send+0x20/0x30
[40595.837494]  tnl_update_pmtu.isra.27+0x19d/0x2a0 [ip_tunnel]
[40595.838088]  ip_md_tunnel_xmit+0x1b6/0x510 [ip_tunnel]
[40595.838633]  gre_tap_xmit+0x10c/0x160 [ip_gre]
[40595.839103]  dev_hard_start_xmit+0x93/0x200
[40595.839551]  sch_direct_xmit+0x101/0x2d0
[40595.839967]  __dev_queue_xmit+0x69f/0x9c0
[40595.840399]  do_execute_actions+0x1717/0x1910 [openvswitch]
[40595.840987]  ? validate_set.isra.12+0x2f5/0x3d0 [openvswitch]
[40595.841596]  ? reserve_sfa_size+0x31/0x130 [openvswitch]
[40595.842154]  ? __ovs_nla_copy_actions+0x1b4/0xad0 [openvswitch]
[40595.842778]  ? __kmalloc_reserve.isra.50+0x2e/0x80
[40595.843285]  ? should_failslab+0xa/0x20
[40595.843696]  ? __kmalloc+0x188/0x220
[40595.844078]  ? __alloc_skb+0x97/0x270
[40595.844472]  ovs_execute_actions+0x47/0x120 [openvswitch]
[40595.845041]  ovs_packet_cmd_execute+0x27d/0x2b0 [openvswitch]
[40595.845648]  genl_family_rcv_msg+0x3a8/0x430
[40595.846101]  genl_rcv_msg+0x47/0x90
[40595.846476]  ? __alloc_skb+0x83/0x270
[40595.846866]  ? genl_family_rcv_msg+0x430/0x430
[40595.847335]  netlink_rcv_skb+0xcb/0x100
[40595.847777]  genl_rcv+0x24/0x40
[40595.848113]  netlink_unicast+0x17f/0x230
[40595.848535]  netlink_sendmsg+0x2ed/0x3e0
[40595.848951]  sock_sendmsg+0x4f/0x60
[40595.849323]  ___sys_sendmsg+0x2bd/0x2e0
[40595.849733]  ? sock_poll+0x6f/0xb0
[40595.850098]  ? ep_scan_ready_list.isra.14+0x20b/0x240
[40595.850634]  ? _cond_resched+0x15/0x30
[40595.851032]  ? ep_poll+0x11b/0x440
[40595.851401]  ? _copy_to_user+0x22/0x30
[40595.851799]  __sys_sendmsg+0x58/0xa0
[40595.852180]  do_syscall_64+0x5b/0x190
[40595.852574]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[40595.853105] RIP: 0033:0x7f5c00038c7d
[40595.853489] Code: c7 20 00 00 75 10 b8 2e 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 8e f7 ff ff 48 89 04 24 b8 2e 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 d7 f7 ff ff 48 89
d0 48 83 c4 08 48 3d 01
[40595.855443] RSP: 002b:00007f5bfcf73c00 EFLAGS: 00003293 ORIG_RAX: 000000000000002e
[40595.856244] RAX: ffffffffffffffda RBX: 00007f5bfcf74a60 RCX: 00007f5c00038c7d
[40595.856990] RDX: 0000000000000000 RSI: 00007f5bfcf73c60 RDI: 0000000000000015
[40595.857736] RBP: 0000000000000004 R08: 0000000000000b7c R09: 0000000000000110
[40595.858613] R10: 0001000800050004 R11: 0000000000003293 R12: 000055c2d8329da0
[40595.859401] R13: 00007f5bfcf74120 R14: 0000000000000347 R15: 00007f5bfcf73c60
[40595.860185] Modules linked in: ip_gre ip_tunnel gre openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 sunrpc bochs_drm ttm drm_kms_helper drm pcspkr joydev i2c_piix4 qemu_fw_cfg xfs libcrc32c virtio_net net_failover serio_raw failover ata_generic virtio_blk pata_acpi floppy
[40595.863155] CR2: 0000000000000108
[40595.863551] ---[ end trace 22209bbcacb4addd ]---

Fixes: c8b34e680a09 ("ip_tunnel: Add tnl_update_pmtu in ip_md_tunnel_xmit")
Fixes: 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 net/ipv4/ip_tunnel.c  |  3 +++
 net/ipv6/ip6_tunnel.c | 13 +++++++++----
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index 38c02bb62e2c..c6713c7287df 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -597,6 +597,9 @@ void ip_md_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
 		goto tx_error;
 	}
 
+	if (skb_dst(skb) && !skb_dst(skb)->dev)
+		skb_dst(skb)->dev = rt->dst.dev;
+
 	if (key->tun_flags & TUNNEL_DONT_FRAGMENT)
 		df = htons(IP_DF);
 	if (tnl_update_pmtu(dev, skb, rt, df, inner_iph, tunnel_hlen,
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 754a484d35df..6ccf8f0eb8e7 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1109,10 +1109,15 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev, __u8 dsfield,
 			dst = NULL;
 			goto tx_err_link_failure;
 		}
-		if (t->parms.collect_md && ipv6_addr_any(&fl6->saddr) &&
-		    ipv6_dev_get_saddr(net, ip6_dst_idev(dst)->dev,
-				       &fl6->daddr, 0, &fl6->saddr))
-			goto tx_err_link_failure;
+		if (t->parms.collect_md) {
+			if (ipv6_addr_any(&fl6->saddr) &&
+			    ipv6_dev_get_saddr(net, ip6_dst_idev(dst)->dev,
+					       &fl6->daddr, 0, &fl6->saddr))
+				goto tx_err_link_failure;
+
+			if (skb_dst(skb) && !skb_dst(skb)->dev)
+				skb_dst(skb)->dev = dst->dev;
+		}
 		ndst = dst;
 	}
 
-- 
2.19.2


^ permalink raw reply related

* Re: [PATCH net-next v2 11/14] netdevsim: Add devlink-trap support
From: Ido Schimmel @ 2019-08-15  6:27 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, davem, nhorman, jiri, toke, dsahern, roopa, nikolay, andy,
	f.fainelli, andrew, vivien.didelot, mlxsw, Ido Schimmel
In-Reply-To: <20190814165957.0e626f57@cakuba.netronome.com>

On Wed, Aug 14, 2019 at 04:59:57PM -0700, Jakub Kicinski wrote:
> On Tue, 13 Aug 2019 10:53:57 +0300, Ido Schimmel wrote:
> > diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
> > index 08ca59fc189b..2758d95c8d18 100644
> > --- a/drivers/net/netdevsim/dev.c
> > +++ b/drivers/net/netdevsim/dev.c
> > @@ -17,11 +17,21 @@
> >  
> >  #include <linux/debugfs.h>
> >  #include <linux/device.h>
> > +#include <linux/etherdevice.h>
> > +#include <linux/inet.h>
> > +#include <linux/jiffies.h>
> > +#include <linux/kernel.h>
> >  #include <linux/list.h>
> >  #include <linux/mutex.h>
> >  #include <linux/random.h>
> > +#include <linux/workqueue.h>
> > +#include <linux/random.h>
> >  #include <linux/rtnetlink.h>
> >  #include <net/devlink.h>
> > +#include <net/ip.h>
> > +#include <uapi/linux/devlink.h>
> > +#include <uapi/linux/ip.h>
> > +#include <uapi/linux/udp.h>
> 
> Please keep includes ordered alphabetically. You're adding
> linux/random.h second time.

Will fix.

> 
> >  #include "netdevsim.h"
> 
> > +static void nsim_dev_trap_report(struct nsim_dev_port *nsim_dev_port)
> > +{
> > +	struct nsim_dev *nsim_dev = nsim_dev_port->ns->nsim_dev;
> > +	struct nsim_trap_data *nsim_trap_data = nsim_dev->trap_data;
> > +	struct devlink *devlink = priv_to_devlink(nsim_dev);
> > +	int i;
> 
> reverse christmas tree, please

Likewise.

Thanks!

^ permalink raw reply

* [PATCH] net: hns: hns_enet: Add of_node_put in hns_nic_dev_probe()
From: Nishka Dasgupta @ 2019-08-15  6:28 UTC (permalink / raw)
  To: yisen.zhuang, salil.mehta, davem, netdev; +Cc: Nishka Dasgupta

The local variable ae_node in function hns_nic_dev_probe takes the
return value of of_parse_phandle, which gets a node but does not put it.
This may cause a memory leak. Hence put ae_node after the last time it
is invoked.
Issue found with Coccinelle.

Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com>
---
 drivers/net/ethernet/hisilicon/hns/hns_enet.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
index 2235dd55fab2..b26e84929e1e 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
@@ -2309,6 +2309,7 @@ static int hns_nic_dev_probe(struct platform_device *pdev)
 			goto out_read_prop_fail;
 		}
 		priv->fwnode = &ae_node->fwnode;
+		of_node_put(ae_node);
 	} else if (is_acpi_node(dev->fwnode)) {
 		struct fwnode_reference_args args;
 
-- 
2.19.1


^ permalink raw reply related

* Re: [PATCH net-next v2 13/14] selftests: devlink_trap: Add test cases for devlink-trap
From: Ido Schimmel @ 2019-08-15  6:34 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, davem, nhorman, jiri, toke, dsahern, roopa, nikolay, andy,
	f.fainelli, andrew, vivien.didelot, mlxsw, Ido Schimmel
In-Reply-To: <20190814174229.1ab4fd1b@cakuba.netronome.com>

On Wed, Aug 14, 2019 at 05:42:29PM -0700, Jakub Kicinski wrote:
> On Tue, 13 Aug 2019 10:53:59 +0300, Ido Schimmel wrote:
> > From: Ido Schimmel <idosch@mellanox.com>
> > 
> > Add test cases for devlink-trap on top of the netdevsim implementation.
> > 
> > The tests focus on the devlink-trap core infrastructure and user space
> > API. They test both good and bad flows and also dismantle of the netdev
> > and devlink device used to report trapped packets.
> > 
> > This allows device drivers to focus their tests on device-specific
> > functionality.
> > 
> > Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> > Acked-by: Jiri Pirko <jiri@mellanox.com>
> 
> Thanks for the test!
> 
> Should it perhaps live in:
> tools/testing/selftests/drivers/net/netdevsim/
> ?
> 
> That's where Jiri puts his devlink tests..

Yea, good point. Will move it there.

> 
> Also the test seems to require netdevsim to be loaded, otherwise:
> # ./devlink_trap.sh 
> SKIP: No netdevsim support
> 
> Is that expected?

No, my bad. I need to change the check to see if netdevsim is loaded and
otherwise load the module.

Thanks!

^ permalink raw reply

* Re: [PATCH] Bluetooth: 6lowpan: Make variable header_ops constant
From: Marcel Holtmann @ 2019-08-15  6:46 UTC (permalink / raw)
  To: Nishka Dasgupta
  Cc: Johan Hedberg, linux-bluetooth, linux-kernel, David S. Miller,
	netdev
In-Reply-To: <20190815055255.1153-1-nishkadg.linux@gmail.com>

Hi Nishka,

> Static variable header_ops, of type header_ops, is used only once, when
> it is assigned to field header_ops of a variable having type net_device.
> This corresponding field is declared as const in the definition of
> net_device. Hence make header_ops constant as well to protect it from
> unnecessary modification.
> Issue found with Coccinelle.
> 
> Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com>
> ---
> net/bluetooth/6lowpan.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)

patch has been applied to bluetooth-next tree.

Regards

Marcel


^ permalink raw reply

* Re: [PATCH] netfilter: nft_bitwise: Adjust parentheses to fix memcmp size argument
From: Joe Perches @ 2019-08-15  6:56 UTC (permalink / raw)
  To: Nathan Chancellor, Pablo Neira Ayuso, Jozsef Kadlecsik,
	Florian Westphal
  Cc: David S. Miller, netfilter-devel, coreteam, netdev, linux-kernel,
	clang-built-linux, kbuild test robot
In-Reply-To: <20190814165809.46421-1-natechancellor@gmail.com>

On Wed, 2019-08-14 at 09:58 -0700, Nathan Chancellor wrote:
> clang warns:
> 
> net/netfilter/nft_bitwise.c:138:50: error: size argument in 'memcmp'
> call is a comparison [-Werror,-Wmemsize-comparison]
>         if (memcmp(&priv->xor, &zero, sizeof(priv->xor) ||
>                                       ~~~~~~~~~~~~~~~~~~^~
> net/netfilter/nft_bitwise.c:138:6: note: did you mean to compare the
> result of 'memcmp' instead?
>         if (memcmp(&priv->xor, &zero, sizeof(priv->xor) ||
>             ^
>                                                        )
> net/netfilter/nft_bitwise.c:138:32: note: explicitly cast the argument
> to size_t to silence this warning
>         if (memcmp(&priv->xor, &zero, sizeof(priv->xor) ||
>                                       ^
>                                       (size_t)(
> 1 error generated.
> 
> Adjust the parentheses so that the result of the sizeof is used for the
> size argument in memcmp, rather than the result of the comparison (which
> would always be true because sizeof is a non-zero number).
> 
> Fixes: bd8699e9e292 ("netfilter: nft_bitwise: add offload support")
> Link: https://github.com/ClangBuiltLinux/linux/issues/638
> Reported-by: kbuild test robot <lkp@intel.com>
> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
> ---
>  net/netfilter/nft_bitwise.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/netfilter/nft_bitwise.c b/net/netfilter/nft_bitwise.c
[]
> @@ -135,8 +135,8 @@ static int nft_bitwise_offload(struct nft_offload_ctx *ctx,
>  {
>  	const struct nft_bitwise *priv = nft_expr_priv(expr);
>  
> -	if (memcmp(&priv->xor, &zero, sizeof(priv->xor) ||
> -	    priv->sreg != priv->dreg))
> +	if (memcmp(&priv->xor, &zero, sizeof(priv->xor)) ||
> +	    priv->sreg != priv->dreg)

This code should use memchr_inv and not compare against a
static uninitialized struct.

Perhaps linux should introduce and use memcchr like bsd. 
or just add something like #define memcchr memchr_inv





^ permalink raw reply

* Re: WARNING in is_bpf_text_address
From: Will Deacon @ 2019-08-15  7:51 UTC (permalink / raw)
  To: syzbot, bvanassche
  Cc: akpm, ast, bpf, bvanassche, daniel, davem, dvyukov, hawk, hdanton,
	jakub.kicinski, johannes.berg, johannes, john.fastabend, kafai,
	linux-kernel, longman, mingo, netdev, paulmck, peterz,
	songliubraving, syzkaller-bugs, tglx, tj, torvalds, will.deacon,
	xdp-newbies, yhs
In-Reply-To: <000000000000e56cb0058fcc6c28@google.com>

Hi Bart,

On Sat, Aug 10, 2019 at 05:24:06PM -0700, syzbot wrote:
> syzbot has found a reproducer for the following crash on:
> 
> HEAD commit:    451577f3 Merge tag 'kbuild-fixes-v5.3-3' of git://git.kern..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=120850a6600000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=2031e7d221391b8a
> dashboard link: https://syzkaller.appspot.com/bug?extid=bd3bba6ff3fcea7a6ec6
> compiler:       clang version 9.0.0 (/home/glider/llvm/clang
> 80fee25776c2fb61e74c1ecb1a523375c2500b69)
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=130ffe4a600000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17137d2c600000
> 
> The bug was bisected to:
> 
> commit a0b0fd53e1e67639b303b15939b9c653dbe7a8c4
> Author: Bart Van Assche <bvanassche@acm.org>
> Date:   Thu Feb 14 23:00:46 2019 +0000
> 
>     locking/lockdep: Free lock classes that are no longer in use
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=152f6a9da00000
> final crash:    https://syzkaller.appspot.com/x/report.txt?x=172f6a9da00000
> console output: https://syzkaller.appspot.com/x/log.txt?x=132f6a9da00000

I know you don't think much to these reports, but please could you have a
look (even if it's just to declare it a false positive)?

Cheers,

Will

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox