Netdev List
 help / color / mirror / Atom feed
* [PATCH bpf-next 2/3] tools, perf: use smp_{rmb,mb} barriers instead of {rmb,mb}
From: Daniel Borkmann @ 2018-10-17 14:41 UTC (permalink / raw)
  To: alexei.starovoitov
  Cc: peterz, paulmck, will.deacon, acme, yhs, john.fastabend, netdev,
	Daniel Borkmann
In-Reply-To: <20181017144156.16639-1-daniel@iogearbox.net>

Switch both rmb()/mb() barriers to more lightweight smp_rmb()/smp_mb()
ones. When walking the perf ring buffer they pair the following way,
quoting kernel/events/ring_buffer.c:

  Since the mmap() consumer (userspace) can run on a different CPU:

    kernel                               user

    if (LOAD ->data_tail) {              LOAD ->data_head
                          (A)            smp_rmb()       (C)
      STORE $data                        LOAD $data
      smp_wmb()           (B)            smp_mb()        (D)
      STORE ->data_head                  STORE ->data_tail
    }

  Where A pairs with D, and B pairs with C.

  In our case (A) is a control dependency that separates the load
  of the ->data_tail and the stores of $data. In case ->data_tail
  indicates there is no room in the buffer to store $data we do not.

  D needs to be a full barrier since it separates the data READ from
  the tail WRITE.

  For B a WMB is sufficient since it separates two WRITEs, and for C
  an RMB is sufficient since it separates two READs.

Currently, on x86-64, perf uses LFENCE and MFENCE which is overkill
as we can do more lightweight in particular given this is fast-path.

According to Peter rmb()/mb() were added back then via a94d342b9cb0
("tools/perf: Add required memory barriers") at a time where kernel
still supported chips that needed it, but nowadays support for these
has been ditched completely, therefore we can fix them up as well.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/mmap.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
index 05a6d47..de6dc2e 100644
--- a/tools/perf/util/mmap.h
+++ b/tools/perf/util/mmap.h
@@ -73,7 +73,8 @@ static inline u64 perf_mmap__read_head(struct perf_mmap *mm)
 {
 	struct perf_event_mmap_page *pc = mm->base;
 	u64 head = READ_ONCE(pc->data_head);
-	rmb();
+
+	smp_rmb();
 	return head;
 }
 
@@ -84,7 +85,7 @@ static inline void perf_mmap__write_tail(struct perf_mmap *md, u64 tail)
 	/*
 	 * ensure all reads are done before we write the tail out.
 	 */
-	mb();
+	smp_mb();
 	pc->data_tail = tail;
 }
 
-- 
2.9.5

^ permalink raw reply related

* [PATCH bpf-next 1/3] tools: add smp_* barrier variants to include infrastructure
From: Daniel Borkmann @ 2018-10-17 14:41 UTC (permalink / raw)
  To: alexei.starovoitov
  Cc: peterz, paulmck, will.deacon, acme, yhs, john.fastabend, netdev,
	Daniel Borkmann
In-Reply-To: <20181017144156.16639-1-daniel@iogearbox.net>

Add the definition for smp_rmb(), smp_wmb(), and smp_mb() to the
tools include infrastructure. This patch adds the implementation
for x86-64 and arm64, and have it fall back for other archs which
do not have it implemented at this point such that others can be
added successively for those who have access to test machines. The
x86-64 one uses lock + add combination for smp_mb() with address
below red zone.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/arch/arm64/include/asm/barrier.h | 10 ++++++++++
 tools/arch/x86/include/asm/barrier.h   |  9 ++++++---
 tools/include/asm/barrier.h            | 11 +++++++++++
 3 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/tools/arch/arm64/include/asm/barrier.h b/tools/arch/arm64/include/asm/barrier.h
index 40bde6b..acf1f06 100644
--- a/tools/arch/arm64/include/asm/barrier.h
+++ b/tools/arch/arm64/include/asm/barrier.h
@@ -14,4 +14,14 @@
 #define wmb()		asm volatile("dmb ishst" ::: "memory")
 #define rmb()		asm volatile("dmb ishld" ::: "memory")
 
+/*
+ * Kernel uses dmb variants on arm64 for smp_*() barriers. Pretty much the same
+ * implementation as above mb()/wmb()/rmb(), though for the latter kernel uses
+ * dsb. In any case, should above mb()/wmb()/rmb() change, make sure the below
+ * smp_*() don't.
+ */
+#define smp_mb()	asm volatile("dmb ish" ::: "memory")
+#define smp_wmb()	asm volatile("dmb ishst" ::: "memory")
+#define smp_rmb()	asm volatile("dmb ishld" ::: "memory")
+
 #endif /* _TOOLS_LINUX_ASM_AARCH64_BARRIER_H */
diff --git a/tools/arch/x86/include/asm/barrier.h b/tools/arch/x86/include/asm/barrier.h
index 8774dee..c97c0c5 100644
--- a/tools/arch/x86/include/asm/barrier.h
+++ b/tools/arch/x86/include/asm/barrier.h
@@ -21,9 +21,12 @@
 #define rmb()	asm volatile("lock; addl $0,0(%%esp)" ::: "memory")
 #define wmb()	asm volatile("lock; addl $0,0(%%esp)" ::: "memory")
 #elif defined(__x86_64__)
-#define mb() 	asm volatile("mfence":::"memory")
-#define rmb()	asm volatile("lfence":::"memory")
-#define wmb()	asm volatile("sfence" ::: "memory")
+#define mb()      asm volatile("mfence" ::: "memory")
+#define rmb()     asm volatile("lfence" ::: "memory")
+#define wmb()     asm volatile("sfence" ::: "memory")
+#define smp_rmb() barrier()
+#define smp_wmb() barrier()
+#define smp_mb()  asm volatile("lock; addl $0,-132(%%rsp)" ::: "memory", "cc")
 #endif
 
 #endif /* _TOOLS_LINUX_ASM_X86_BARRIER_H */
diff --git a/tools/include/asm/barrier.h b/tools/include/asm/barrier.h
index 391d942..e4c8845 100644
--- a/tools/include/asm/barrier.h
+++ b/tools/include/asm/barrier.h
@@ -1,4 +1,5 @@
 /* SPDX-License-Identifier: GPL-2.0 */
+#include <linux/compiler.h>
 #if defined(__i386__) || defined(__x86_64__)
 #include "../../arch/x86/include/asm/barrier.h"
 #elif defined(__arm__)
@@ -26,3 +27,13 @@
 #else
 #include <asm-generic/barrier.h>
 #endif
+/* Fallback definitions for archs that haven't been updated yet. */
+#ifndef smp_rmb
+# define smp_rmb()	rmb()
+#endif
+#ifndef smp_wmb
+# define smp_wmb()	wmb()
+#endif
+#ifndef smp_mb
+# define smp_mb()	mb()
+#endif
-- 
2.9.5

^ permalink raw reply related

* [PATCH bpf-next 0/3] improve and fix barriers for walking perf rb
From: Daniel Borkmann @ 2018-10-17 14:41 UTC (permalink / raw)
  To: alexei.starovoitov
  Cc: peterz, paulmck, will.deacon, acme, yhs, john.fastabend, netdev,
	Daniel Borkmann

This set first adds smp_* barrier variants to tools infrastructure
and in a second step updates perf and libbpf to make use of them.
For details, please see individual patches, thanks!

Arnaldo, if there are no objections, could this be routed via bpf-next
with Acked-by's due to later dependencies in libbpf? Alternatively,
I could also get the 2nd patch out during merge window, but perhaps
it's okay to do in one go as there shouldn't be much conflict in perf.

Thanks!

Daniel Borkmann (3):
  tools: add smp_* barrier variants to include infrastructure
  tools, perf: use smp_{rmb,mb} barriers instead of {rmb,mb}
  bpf, libbpf: use proper barriers in perf ring buffer walk

 tools/arch/arm64/include/asm/barrier.h | 10 ++++++++++
 tools/arch/x86/include/asm/barrier.h   |  9 ++++++---
 tools/include/asm/barrier.h            | 11 +++++++++++
 tools/lib/bpf/libbpf.c                 | 25 +++++++++++++++++++------
 tools/perf/util/mmap.h                 |  5 +++--
 5 files changed, 49 insertions(+), 11 deletions(-)

-- 
2.9.5

^ permalink raw reply

* [PATCH bpf] bpf: fix doc of bpf_skb_adjust_room() in uapi
From: Nicolas Dichtel @ 2018-10-17 14:24 UTC (permalink / raw)
  To: ast, daniel, davem; +Cc: netdev, Nicolas Dichtel, Quentin Monnet

len_diff is signed.

Fixes: fa15601ab31e ("bpf: add documentation for eBPF helpers (33-41)")
CC: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 include/uapi/linux/bpf.h       | 2 +-
 tools/include/uapi/linux/bpf.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 66917a4eba27..c4ffe91d5598 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1430,7 +1430,7 @@ union bpf_attr {
  * 	Return
  * 		0 on success, or a negative error in case of failure.
  *
- * int bpf_skb_adjust_room(struct sk_buff *skb, u32 len_diff, u32 mode, u64 flags)
+ * int bpf_skb_adjust_room(struct sk_buff *skb, s32 len_diff, u32 mode, u64 flags)
  * 	Description
  * 		Grow or shrink the room for data in the packet associated to
  * 		*skb* by *len_diff*, and according to the selected *mode*.
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 66917a4eba27..c4ffe91d5598 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1430,7 +1430,7 @@ union bpf_attr {
  * 	Return
  * 		0 on success, or a negative error in case of failure.
  *
- * int bpf_skb_adjust_room(struct sk_buff *skb, u32 len_diff, u32 mode, u64 flags)
+ * int bpf_skb_adjust_room(struct sk_buff *skb, s32 len_diff, u32 mode, u64 flags)
  * 	Description
  * 		Grow or shrink the room for data in the packet associated to
  * 		*skb* by *len_diff*, and according to the selected *mode*.
-- 
2.18.0

^ permalink raw reply related

* my subject
From: test @ 2018-10-17 11:31 UTC (permalink / raw)
  To: Recipients

I am Peter Wong director of operations, Hong Kong and Shanghai Banking
Corporation Limited Hong Kong. I have a very confidential business
proposition involving transfer of $18.350.000.00 that will be of great
benefit for both of us. Reply for more details as regards this
transaction

Best Regards
Peter Wong
ec

^ permalink raw reply

* Re: Bug in MACSec - stops passing traffic after approx 5TB
From: Josh Coombs @ 2018-10-17 13:45 UTC (permalink / raw)
  To: sd; +Cc: netdev
In-Reply-To: <CACcUnf8kTNEYW8ZisbEYMr+Pu7rmP7pKS=w+-ynUJ_sT75HZ0w@mail.gmail.com>

I've got wpa_supplicant working with macsec on Fedora, my test bed has
shuffled 16 billion packets so far without interruption.  I am a bit
concerned that I've just pushed the resource exhaustion issue down the
road though, looking at the output of ip macsec show I see four SAs
for TX and RX, it appears to negotiate a new pair every 3 to 3.5
billion packets.  It doesn't appear to be ripping down old SAs.  What
happens when available SA slots run out?

Joshua Coombs
GWI

office 207-494-2140
www.gwi.net

On Mon, Oct 15, 2018 at 11:45 AM Josh Coombs <jcoombs@staff.gwi.net> wrote:
>
> And confirmed, starting with a high packet number results in a very
> short testbed run, 296 packets and then nothing, just as you surmised.
> Sorry for raising the alarm falsely.  Looks like I need to roll my own
> build of wpa_supplicant as the ubuntu builds don't include the macsec
> driver, haven't tested Gentoo's ebuilds yet to see if they do.
>
> Josh Coombs
>
> On Sun, Oct 14, 2018 at 4:52 PM Josh Coombs <jcoombs@staff.gwi.net> wrote:
> >
> > On Sun, Oct 14, 2018 at 4:24 PM Sabrina Dubroca <sd@queasysnail.net> wrote:
> > >
> > > 2018-10-14, 10:59:31 -0400, Josh Coombs wrote:
> > > > I initially mistook this for a traffic control issue, but after
> > > > stripping the test beds down to just the MACSec component, I can still
> > > > replicate the issue.  After approximately 5TB of transfer / 4 billion
> > > > packets over a MACSec link it stops passing traffic.
> > >
> > > I think you're just hitting packet number exhaustion. After 2^32
> > > packets, the packet number would wrap to 0 and start being reused,
> > > which breaks the crypto used by macsec. Before this point, you have to
> > > add a new SA, and tell the macsec device to switch to it.
> >
> > I had not considered that, I naively thought as long as I didn't
> > specify a replay window, it'd roll the PN over on it's own and life
> > would be good.  I'll test that theory tomorrow, should be easy to
> > prove out.
> >
> > > That's why you should be using wpa_supplicant. It will monitor the
> > > growth of the packet number, and handle the rekey for you.
> >
> > Thank you for the heads up, I'll read up on this as well.
> >
> > Josh C

^ permalink raw reply

* [PATCH net] sctp: fix the data size calculation in sctp_data_size
From: Xin Long @ 2018-10-17 13:11 UTC (permalink / raw)
  To: network dev, linux-sctp; +Cc: davem, Marcelo Ricardo Leitner, Neil Horman

sctp data size should be calculated by subtracting data chunk header's
length from chunk_hdr->length, not just data header.

Fixes: 668c9beb9020 ("sctp: implement assign_number for sctp_stream_interleave")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 include/net/sctp/sm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/sctp/sm.h b/include/net/sctp/sm.h
index 5ef1bad..9e3d327 100644
--- a/include/net/sctp/sm.h
+++ b/include/net/sctp/sm.h
@@ -347,7 +347,7 @@ static inline __u16 sctp_data_size(struct sctp_chunk *chunk)
 	__u16 size;
 
 	size = ntohs(chunk->chunk_hdr->length);
-	size -= sctp_datahdr_len(&chunk->asoc->stream);
+	size -= sctp_datachk_len(&chunk->asoc->stream);
 
 	return size;
 }
-- 
2.1.0

^ permalink raw reply related

* Re: [PATCH bpf-next 2/3] bpf: emit RECORD_MMAP events for bpf prog load/unload
From: Arnaldo Carvalho de Melo @ 2018-10-17 12:50 UTC (permalink / raw)
  To: Song Liu
  Cc: David Ahern, Alexei Starovoitov, Peter Zijlstra,
	Alexei Starovoitov, Alexey Budankov, David S . Miller,
	Daniel Borkmann, Namhyung Kim, Jiri Olsa, Networking, kernel-team
In-Reply-To: <20181017121140.GA31465@kernel.org>

Em Wed, Oct 17, 2018 at 09:11:40AM -0300, Arnaldo Carvalho de Melo escreveu:
> Adding Alexey, Jiri and Namhyung as they worked/are working on
> multithreading 'perf record'.
> 
> Em Tue, Oct 16, 2018 at 11:43:11PM -0700, Song Liu escreveu:
> > On Tue, Oct 16, 2018 at 4:43 PM David Ahern <dsahern@gmail.com> wrote:
> > > On 10/15/18 4:33 PM, Song Liu wrote:
> > > > I am working with Alexei on the idea of fetching BPF program information via
> > > > BPF_OBJ_GET_INFO_BY_FD cmd. I added PERF_RECORD_BPF_EVENT
> > > > to perf_event_type, and dumped these events to perf event ring buffer.
> 
> > > > I found that perf will not process event until the end of perf-record:
> 
> > > > root@virt-test:~# ~/perf record -ag -- sleep 10
> > > > ...... 10 seconds later
> > > > [ perf record: Woken up 34 times to write data ]
> > > > machine__process_bpf_event: prog_id 6 loaded
> > > > machine__process_bpf_event: prog_id 6 unloaded
> > > > [ perf record: Captured and wrote 9.337 MB perf.data (93178 samples) ]
> 
> > > > In this example, the bpf program was loaded and then unloaded in
> > > > another terminal. When machine__process_bpf_event() processes
> > > > the load event, the bpf program is already unloaded. Therefore,
> > > > machine__process_bpf_event() will not be able to get information
> > > > about the program via BPF_OBJ_GET_INFO_BY_FD cmd.
> 
> > > > To solve this problem, we will need to run BPF_OBJ_GET_INFO_BY_FD
> > > > as soon as perf get the event from kernel. I looked around the perf
> > > > code for a while. But I haven't found a good example where some
> > > > events are processed before the end of perf-record. Could you
> > > > please help me with this?
> 
> > > perf record does not process events as they are generated. Its sole job
> > > is pushing data from the maps to a file as fast as possible meaning in
> > > bulk based on current read and write locations.
> 
> > > Adding code to process events will add significant overhead to the
> > > record command and will not really solve your race problem.
> 
> > I agree that processing events while recording has significant overhead.
> > In this case, perf user space need to know details about the the jited BPF
> > program. It is impossible to pass all these details to user space through
> > the relatively stable ring_buffer API. Therefore, some processing of the
> > data is necessary (get bpf prog_id from ring buffer, and then fetch program
> > details via BPF_OBJ_GET_INFO_BY_FD.
>  
> > I have some idea on processing important data with relatively low overhead.
> > Let me try implement it.
> 
> Well, you could have a separate thread processing just those kinds of
> events, associate it with a dummy event where you only ask for
> PERF_RECORD_BPF_EVENTs.
> 
> Here is how to setup the PERF_TYPE_SOFTWARE/PERF_COUNT_SW_DUMMY
> perf_event_attr:
> 
> [root@seventh ~]# perf record -vv -e dummy sleep 01
> ------------------------------------------------------------
> perf_event_attr:
>   type                             1
>   size                             112
>   config                           0x9
>   { sample_period, sample_freq }   4000
>   sample_type                      IP|TID|TIME|PERIOD
>   disabled                         1
>   inherit                          1

These you would have disabled, no need for
PERF_RECORD_{MMAP*,COMM,FORK,EXIT} just PERF_RECORD_BPF_EVENT

>   mmap                             1
>   comm                             1
>   task                             1
>   mmap2                            1
>   comm_exec                        1


>   freq                             1
>   enable_on_exec                   1
>   sample_id_all                    1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid 12046  cpu 0  group_fd -1  flags 0x8 = 4
> sys_perf_event_open: pid 12046  cpu 1  group_fd -1  flags 0x8 = 5
> sys_perf_event_open: pid 12046  cpu 2  group_fd -1  flags 0x8 = 6
> sys_perf_event_open: pid 12046  cpu 3  group_fd -1  flags 0x8 = 8
> mmap size 528384B
> perf event ring buffer mmapped per cpu
> Synthesizing TSC conversion information
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.014 MB perf.data ]
> [root@seventh ~]#
> 
> [root@seventh ~]# perf evlist -v
> dummy: type: 1, size: 112, config: 0x9, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
> [root@seventh ~]# 
> 
> There is work ongoing in dumping one file per cpu and then, at post
> processing time merging all those files to get ordering, so one more
> file, for these VIP events, that require per-event processing would be
> ordered at that time with all the other per-cpu files.
> 
> - Arnaldo

^ permalink raw reply

* [PATCH V1 net-next] net: ena: enable Low Latency Queues
From: akiyano @ 2018-10-17 12:33 UTC (permalink / raw)
  To: davem, netdev
  Cc: Arthur Kiyanovski, dwmw, zorik, matua, saeedb, msw, aliguori,
	nafea, gtzalik, netanel, alisaidi

From: Arthur Kiyanovski <akiyano@amazon.com>

Use the new API to enable usage of LLQ.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 18 ++++--------------
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 284a0a6..18956e7 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -3022,20 +3022,10 @@ static int ena_calc_io_queue_num(struct pci_dev *pdev,
 	int io_sq_num, io_queue_num;
 
 	/* In case of LLQ use the llq number in the get feature cmd */
-	if (ena_dev->tx_mem_queue_type == ENA_ADMIN_PLACEMENT_POLICY_DEV) {
-		io_sq_num = get_feat_ctx->max_queues.max_legacy_llq_num;
-
-		if (io_sq_num == 0) {
-			dev_err(&pdev->dev,
-				"Trying to use LLQ but llq_num is 0. Fall back into regular queues\n");
-
-			ena_dev->tx_mem_queue_type =
-				ENA_ADMIN_PLACEMENT_POLICY_HOST;
-			io_sq_num = get_feat_ctx->max_queues.max_sq_num;
-		}
-	} else {
+	if (ena_dev->tx_mem_queue_type == ENA_ADMIN_PLACEMENT_POLICY_DEV)
+		io_sq_num = get_feat_ctx->llq.max_llq_num;
+	else
 		io_sq_num = get_feat_ctx->max_queues.max_sq_num;
-	}
 
 	io_queue_num = min_t(int, num_online_cpus(), ENA_MAX_NUM_IO_QUEUES);
 	io_queue_num = min_t(int, io_queue_num, io_sq_num);
@@ -3238,7 +3228,7 @@ static int ena_calc_queue_size(struct pci_dev *pdev,
 
 	if (ena_dev->tx_mem_queue_type == ENA_ADMIN_PLACEMENT_POLICY_DEV)
 		queue_size = min_t(u32, queue_size,
-				   get_feat_ctx->max_queues.max_legacy_llq_depth);
+				   get_feat_ctx->llq.max_llq_depth);
 
 	queue_size = rounddown_pow_of_two(queue_size);
 
-- 
2.7.4

^ permalink raw reply related

* Re: [RFC] VSOCK: The performance problem of vhost_vsock.
From: Jason Wang @ 2018-10-17 12:31 UTC (permalink / raw)
  To: jiangyiwen, stefanha; +Cc: netdev, kvm, virtualization
In-Reply-To: <5BC72006.9010000@huawei.com>


On 2018/10/17 下午7:41, jiangyiwen wrote:
> On 2018/10/17 17:51, Jason Wang wrote:
>> On 2018/10/17 下午5:39, Jason Wang wrote:
>>>> Hi Jason and Stefan,
>>>>
>>>> Maybe I find the reason of bad performance.
>>>>
>>>> I found pkt_len is limited to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE(4K),
>>>> it will cause the bandwidth is limited to 500~600MB/s. And once I
>>>> increase to 64k, it can improve about 3 times(~1500MB/s).
>>>
>>> Looks like the value was chosen for a balance between rx buffer size and performance. Allocating 64K always even for small packet is kind of waste and stress for guest memory. Virito-net try to avoid this by inventing the merge able rx buffer which allows big packet to be scattered in into different buffers. We can reuse this idea or revisit the idea of using virtio-net/vhost-net as a transport of vsock.
>>>
>>> What interesting is the performance is still behind vhost-net.
>>>
>>> Thanks
>>>
>>>> By the way, I send to 64K in application once, and I don't use
>>>> sg_init_one and rewrite function to packet sg list because pkt_len
>>>> include multiple pages.
>>>>
>>>> Thanks,
>>>> Yiwen.
>>
>> Btw, if you're using vsock for transferring large files, maybe it's more efficient to implement sendpage() for vsock to allow sendfile()/splice() work.
>>
>> Thanks
>>
> I can't agree more.
>
> why vhost_vsock is still behind vhost_net?
> Because I use sendfile() to test performance at first, and then
> I found vsock don't implement sendpage() and cause the bandwidth
> can't be increased. So I use read() and send() to replace sendfile(),
> it will increase some switch between kernel and user mode, and sendfile()
> can support zero copy. I think this is main reason.
>
> Thanks.


Want to post patches for this then :) ?

Thanks


>
>> .
>>
>

^ permalink raw reply

* Re: [RFC PATCH 0/3] M_CAN Framework rework
From: Dan Murphy @ 2018-10-17 20:21 UTC (permalink / raw)
  To: wg, mkl, davem; +Cc: linux-can, netdev, linux-kernel
In-Reply-To: <20181010142055.25271-1-dmurphy@ti.com>

Bump

On 10/10/2018 09:20 AM, Dan Murphy wrote:
> All
> 
> This patch series creates a m_can core framework that devices can register
> to.  The m_can core manages the Bosch IP and CAN frames.  Each device that
> is registered is responsible for managing device specific functions.
> 
> This rewrite was suggested in a device driver submission for the TCAN4x5x
> device
> Reference upstream post:
> https://lore.kernel.org/patchwork/patch/984163/
> 
> For instance the TCAN device is a SPI device that uses a specific data payload to
> determine writes and reads.  In addition the device has a reset input as well
> as a wakeup pin.  The register offset of the m_can registers differs and must
> be set by the device attached to the core.
> 
> The m_can core will use iomapped writes and reads as the default mechanism for
> writing and reading.  The device driver can provide over rides for this.
> 
> This patch series is not complete as it does not handle the CAN interrupts
> nor can perform a CAN write.  If this patch series is deemed acceptable I will
> finish debugging the driver and post a non RFC series.
> 
> Finally I did attempt to reduce the first patch with various git format patch
> directives but none seemed to reduce the patch.
> 
> Dan
> 
> Dan Murphy (3):
>   can: m_can: Create m_can core to leverage common code
>   dt-bindings: can: tcan4x5x: Add DT bindings for TCAN4x5X driver
>   can: tcan4x5x: Add tcan4x5x driver to the kernel
> 
>  .../devicetree/bindings/net/can/tcan4x5x.txt  |   34 +
>  drivers/net/can/m_can/Kconfig                 |   18 +
>  drivers/net/can/m_can/Makefile                |    4 +-
>  drivers/net/can/m_can/m_can.c                 | 1683 +----------------
>  .../net/can/m_can/{m_can.c => m_can_core.c}   |  479 +++--
>  drivers/net/can/m_can/m_can_core.h            |  100 +
>  drivers/net/can/m_can/tcan4x5x.c              |  321 ++++
>  7 files changed, 722 insertions(+), 1917 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/net/can/tcan4x5x.txt
>  copy drivers/net/can/m_can/{m_can.c => m_can_core.c} (83%)
>  create mode 100644 drivers/net/can/m_can/m_can_core.h
>  create mode 100644 drivers/net/can/m_can/tcan4x5x.c
> 


-- 
------------------
Dan Murphy

^ permalink raw reply

* Re: [PATCH bpf-next 2/3] bpf: emit RECORD_MMAP events for bpf prog load/unload
From: Arnaldo Carvalho de Melo @ 2018-10-17 12:11 UTC (permalink / raw)
  To: Song Liu
  Cc: David Ahern, Alexei Starovoitov, Peter Zijlstra,
	Alexei Starovoitov, Alexey Budankov, David S . Miller,
	Daniel Borkmann, Namhyung Kim, Jiri Olsa, Networking, kernel-team
In-Reply-To: <CAPhsuW7zZE51ibma__y8SDVUU_YQMjGyHRhYDhBsuJF_b89h6g@mail.gmail.com>

Adding Alexey, Jiri and Namhyung as they worked/are working on
multithreading 'perf record'.

Em Tue, Oct 16, 2018 at 11:43:11PM -0700, Song Liu escreveu:
> On Tue, Oct 16, 2018 at 4:43 PM David Ahern <dsahern@gmail.com> wrote:
> > On 10/15/18 4:33 PM, Song Liu wrote:
> > > I am working with Alexei on the idea of fetching BPF program information via
> > > BPF_OBJ_GET_INFO_BY_FD cmd. I added PERF_RECORD_BPF_EVENT
> > > to perf_event_type, and dumped these events to perf event ring buffer.

> > > I found that perf will not process event until the end of perf-record:

> > > root@virt-test:~# ~/perf record -ag -- sleep 10
> > > ...... 10 seconds later
> > > [ perf record: Woken up 34 times to write data ]
> > > machine__process_bpf_event: prog_id 6 loaded
> > > machine__process_bpf_event: prog_id 6 unloaded
> > > [ perf record: Captured and wrote 9.337 MB perf.data (93178 samples) ]

> > > In this example, the bpf program was loaded and then unloaded in
> > > another terminal. When machine__process_bpf_event() processes
> > > the load event, the bpf program is already unloaded. Therefore,
> > > machine__process_bpf_event() will not be able to get information
> > > about the program via BPF_OBJ_GET_INFO_BY_FD cmd.

> > > To solve this problem, we will need to run BPF_OBJ_GET_INFO_BY_FD
> > > as soon as perf get the event from kernel. I looked around the perf
> > > code for a while. But I haven't found a good example where some
> > > events are processed before the end of perf-record. Could you
> > > please help me with this?

> > perf record does not process events as they are generated. Its sole job
> > is pushing data from the maps to a file as fast as possible meaning in
> > bulk based on current read and write locations.

> > Adding code to process events will add significant overhead to the
> > record command and will not really solve your race problem.

> I agree that processing events while recording has significant overhead.
> In this case, perf user space need to know details about the the jited BPF
> program. It is impossible to pass all these details to user space through
> the relatively stable ring_buffer API. Therefore, some processing of the
> data is necessary (get bpf prog_id from ring buffer, and then fetch program
> details via BPF_OBJ_GET_INFO_BY_FD.
 
> I have some idea on processing important data with relatively low overhead.
> Let me try implement it.

Well, you could have a separate thread processing just those kinds of
events, associate it with a dummy event where you only ask for
PERF_RECORD_BPF_EVENTs.

Here is how to setup the PERF_TYPE_SOFTWARE/PERF_COUNT_SW_DUMMY
perf_event_attr:

[root@seventh ~]# perf record -vv -e dummy sleep 01
------------------------------------------------------------
perf_event_attr:
  type                             1
  size                             112
  config                           0x9
  { sample_period, sample_freq }   4000
  sample_type                      IP|TID|TIME|PERIOD
  disabled                         1
  inherit                          1
  mmap                             1
  comm                             1
  freq                             1
  enable_on_exec                   1
  task                             1
  sample_id_all                    1
  exclude_guest                    1
  mmap2                            1
  comm_exec                        1
------------------------------------------------------------
sys_perf_event_open: pid 12046  cpu 0  group_fd -1  flags 0x8 = 4
sys_perf_event_open: pid 12046  cpu 1  group_fd -1  flags 0x8 = 5
sys_perf_event_open: pid 12046  cpu 2  group_fd -1  flags 0x8 = 6
sys_perf_event_open: pid 12046  cpu 3  group_fd -1  flags 0x8 = 8
mmap size 528384B
perf event ring buffer mmapped per cpu
Synthesizing TSC conversion information
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.014 MB perf.data ]
[root@seventh ~]#

[root@seventh ~]# perf evlist -v
dummy: type: 1, size: 112, config: 0x9, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
[root@seventh ~]# 

There is work ongoing in dumping one file per cpu and then, at post
processing time merging all those files to get ordering, so one more
file, for these VIP events, that require per-event processing would be
ordered at that time with all the other per-cpu files.

- Arnaldo

^ permalink raw reply

* Re: [RFC] VSOCK: The performance problem of vhost_vsock.
From: jiangyiwen @ 2018-10-17 11:32 UTC (permalink / raw)
  To: Jason Wang, stefanha; +Cc: netdev, kvm, virtualization
In-Reply-To: <d16d2052-bfb1-7861-e210-b53b4ea3260c@redhat.com>

On 2018/10/17 17:39, Jason Wang wrote:
> 
> On 2018/10/17 下午5:27, jiangyiwen wrote:
>> On 2018/10/15 14:12, jiangyiwen wrote:
>>> On 2018/10/15 10:33, Jason Wang wrote:
>>>>
>>>> On 2018年10月15日 09:43, jiangyiwen wrote:
>>>>> Hi Stefan & All:
>>>>>
>>>>> Now I find vhost-vsock has two performance problems even if it
>>>>> is not designed for performance.
>>>>>
>>>>> First, I think vhost-vsock should faster than vhost-net because it
>>>>> is no TCP/IP stack, but the real test result vhost-net is 5~10
>>>>> times than vhost-vsock, currently I am looking for the reason.
>>>> TCP/IP is not a must for vhost-net.
>>>>
>>>> How do you test and compare the performance?
>>>>
>>>> Thanks
>>>>
>>> I test the performance used my test tool, like follows:
>>>
>>> Server                   Client
>>> socket()
>>> bind()
>>> listen()
>>>
>>>                           socket(AF_VSOCK) or socket(AF_INET)
>>> Accept() <-------------->connect()
>>>                           *======Start Record Time======*
>>>                           Call syscall sendfile()
>>> Recv()
>>>                           Send end
>>> Receive end
>>> Send(file_size)
>>>                           Recv(file_size)
>>>                           *======End Record Time======*
>>>
>>> The test result, vhost-vsock is about 500MB/s, and vhost-net is about 2500MB/s.
>>>
>>> By the way, vhost-net use single queue.
>>>
>>> Thanks.
>>>
>>>>> Second, vhost-vsock only supports two vqs(tx and rx), that means
>>>>> if multiple sockets in the guest will use the same vq to transmit
>>>>> the message and get the response. So if there are multiple applications
>>>>> in the guest, we should support "Multiqueue" feature for Virtio-vsock.
>>>>>
>>>>> Stefan, have you encountered these problems?
>>>>>
>>>>> Thanks,
>>>>> Yiwen.
>>>>>
>>>>
>>>> .
>>>>
>>>
>> Hi Jason and Stefan,
>>
>> Maybe I find the reason of bad performance.
>>
>> I found pkt_len is limited to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE(4K),
>> it will cause the bandwidth is limited to 500~600MB/s. And once I
>> increase to 64k, it can improve about 3 times(~1500MB/s).
> 
> 
> Looks like the value was chosen for a balance between rx buffer size and performance. Allocating 64K always even for small packet is kind of waste and stress for guest memory. Virito-net try to avoid this by inventing the merge able rx buffer which allows big packet to be scattered in into different buffers. We can reuse this idea or revisit the idea of using virtio-net/vhost-net as a transport of vsock.
> 
> What interesting is the performance is still behind vhost-net.
> 
> Thanks
> 

Actually I don't understand why pkt_len is limited to
VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE in virtio_transport_send_pkt_info(),
while I think it should used VIRTIO_VSOCK_MAX_PKT_BUF_SIZE instead.

Thanks.

>>
>> By the way, I send to 64K in application once, and I don't use
>> sg_init_one and rewrite function to packet sg list because pkt_len
>> include multiple pages.
>>
>> Thanks,
>> Yiwen.
>>
>>> _______________________________________________
>>> Virtualization mailing list
>>> Virtualization@lists.linux-foundation.org
>>> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
>>>
>>
> 
> .
> 

^ permalink raw reply

* Re: [PATCH bpf-next v2 13/13] tools/bpf: bpftool: add support for jited func types
From: Edward Cree @ 2018-10-17 11:11 UTC (permalink / raw)
  To: Yonghong Song, ast, kafai, daniel, netdev; +Cc: kernel-team
In-Reply-To: <20181017072400.2768484-4-yhs@fb.com>

On 17/10/18 08:24, Yonghong Song wrote:
> This patch added support to print function signature
> if btf func_info is available. Note that ksym
> now uses function name instead of prog_name as
> prog_name has a limit of 16 bytes including
> ending '\0'.
>
> The following is a sample output for selftests
> test_btf with file test_btf_haskv.o:
>
>   $ bpftool prog dump jited id 1
>   int _dummy_tracepoint(struct dummy_tracepoint_args * ):
>   bpf_prog_b07ccb89267cf242__dummy_tracepoint:
>      0:   push   %rbp
>      1:   mov    %rsp,%rbp
>     ......
>     3c:   add    $0x28,%rbp
>     40:   leaveq
>     41:   retq
>
>   int test_long_fname_1(struct dummy_tracepoint_args * ):
>   bpf_prog_2dcecc18072623fc_test_long_fname_1:
>      0:   push   %rbp
>      1:   mov    %rsp,%rbp
>     ......
>     3a:   add    $0x28,%rbp
>     3e:   leaveq
>     3f:   retq
>
>   int test_long_fname_2(struct dummy_tracepoint_args * ):
>   bpf_prog_89d64e4abf0f0126_test_long_fname_2:
>      0:   push   %rbp
>      1:   mov    %rsp,%rbp
>     ......
>     80:   add    $0x28,%rbp
>     84:   leaveq
>     85:   retq
>
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  tools/bpf/bpftool/btf_dumper.c | 96 ++++++++++++++++++++++++++++++++++
>  tools/bpf/bpftool/main.h       |  2 +
>  tools/bpf/bpftool/prog.c       | 54 +++++++++++++++++++
>  3 files changed, 152 insertions(+)
>
> diff --git a/tools/bpf/bpftool/btf_dumper.c b/tools/bpf/bpftool/btf_dumper.c
> index 55bc512a1831..a31df4202335 100644
> --- a/tools/bpf/bpftool/btf_dumper.c
> +++ b/tools/bpf/bpftool/btf_dumper.c
> @@ -249,3 +249,99 @@ int btf_dumper_type(const struct btf_dumper *d, __u32 type_id,
>  {
>  	return btf_dumper_do_type(d, type_id, 0, data);
>  }
> +
> +#define BTF_PRINT_STRING(str)						\
> +	{								\
> +		pos += snprintf(func_sig + pos, size - pos, str);	\
> +		if (pos >= size)					\
> +			return -1;					\
> +	}
Usual kernel practice for this sort of macro is to use
    do { \
    } while(0)
 to ensure correct behaviour if the macro is used within another control
 flow statement, e.g.
    if (x)
        BTF_PRINT_STRING(x);
    else
        do_something_else();
 will not compile with the bare braces as the else will be detached.
> +#define BTF_PRINT_ONE_ARG(fmt, arg)					\
> +	{								\
> +		pos += snprintf(func_sig + pos, size - pos, fmt, arg);	\
> +		if (pos >= size)					\
> +			return -1;					\
> +	}
Any reason for not just using a variadic macro?
> +#define BTF_PRINT_TYPE_ONLY(type)					\
> +	{								\
> +		pos = __btf_dumper_type_only(btf, type, func_sig,	\
> +					     pos, size);		\
> +		if (pos == -1)						\
> +			return -1;					\
> +	}
> +
> +static int __btf_dumper_type_only(struct btf *btf, __u32 type_id,
> +				  char *func_sig, int pos, int size)
> +{
> +	const struct btf_type *t = btf__type_by_id(btf, type_id);
> +	const struct btf_array *array;
> +	int i, vlen;
> +
> +	switch (BTF_INFO_KIND(t->info)) {
> +	case BTF_KIND_INT:
> +		BTF_PRINT_ONE_ARG("%s ",
> +				  btf__name_by_offset(btf, t->name_off));
> +		break;
> +	case BTF_KIND_STRUCT:
> +		BTF_PRINT_ONE_ARG("struct %s ",
> +				  btf__name_by_offset(btf, t->name_off));
> +		break;
> +	case BTF_KIND_UNION:
> +		BTF_PRINT_ONE_ARG("union %s ",
> +				  btf__name_by_offset(btf, t->name_off));
> +		break;
> +	case BTF_KIND_ENUM:
> +		BTF_PRINT_ONE_ARG("enum %s ",
> +				  btf__name_by_offset(btf, t->name_off));
> +		break;
> +	case BTF_KIND_ARRAY:
> +		array = (struct btf_array *)(t + 1);
> +		BTF_PRINT_TYPE_ONLY(array->type);
> +		BTF_PRINT_ONE_ARG("[%d]", array->nelems);
> +		break;
> +	case BTF_KIND_PTR:
> +		BTF_PRINT_TYPE_ONLY(t->type);
> +		BTF_PRINT_STRING("* ");
> +		break;
> +	case BTF_KIND_UNKN:
> +	case BTF_KIND_FWD:
> +	case BTF_KIND_TYPEDEF:
> +		return -1;
> +	case BTF_KIND_VOLATILE:
> +		BTF_PRINT_STRING("volatile ");
> +		BTF_PRINT_TYPE_ONLY(t->type);
> +		break;
> +	case BTF_KIND_CONST:
> +		BTF_PRINT_STRING("const ");
> +		BTF_PRINT_TYPE_ONLY(t->type);
> +		break;
> +	case BTF_KIND_RESTRICT:
> +		BTF_PRINT_STRING("restrict ");
> +		BTF_PRINT_TYPE_ONLY(t->type);
> +		break;
> +	case BTF_KIND_FUNC:
> +	case BTF_KIND_FUNC_PROTO:
> +		BTF_PRINT_TYPE_ONLY(t->type);
> +		BTF_PRINT_ONE_ARG("%s(", btf__name_by_offset(btf, t->name_off));
> +		vlen = BTF_INFO_VLEN(t->info);
> +		for (i = 0; i < vlen; i++) {
> +			__u32 arg_type = ((__u32 *)(t + 1))[i];
> +
> +			BTF_PRINT_TYPE_ONLY(arg_type);
> +			if (i != (vlen - 1))
> +				BTF_PRINT_STRING(", ");
> +		}
In this kind of loop I find it cleaner to print the comma before the item;
 that way the test becomes i != 0.  Thus:
    for (i = 0; i < vlen; i++) {
        __u32 arg_type = ((__u32 *)(t + 1))[i];

        if (i)
            BTF_PRINT_STRING(", ");
        BTF_PRINT_TYPE_ONLY(arg_type);
    }

-Ed

^ permalink raw reply

* Re: [PATCH bpf-next v2 00/13] bpf: add btf func info support
From: Edward Cree @ 2018-10-17 11:02 UTC (permalink / raw)
  To: Yonghong Song, ast, kafai, daniel, netdev; +Cc: kernel-team
In-Reply-To: <20181017072315.2766920-1-yhs@fb.com>

I think the BTF work needs to be better documented; at the moment the only way
 to determine how BTF sections are structured is to read through the headers,
 and cross-reference with the DWARF spec to guess at the semantics of various
 fields.  I've been working on adding BTF support to ebpf_asm, and finding
 very frustrating the amount of guesswork required.
Therefore please make sure that each patch extending the BTF format includes
 documentation patches describing both the layout and the semantics of the new
 extensions.  For example in patch #9 there is no explanation of
 btf_ext_header.line_info_off and btf_ext_header.line_info_len (they're not
 even used by the code, so one cannot reverse-engineer it); while it's fairly
 clear that they indicate the bounds of the line_info subsection, there is no
 specification of what this subsection contains.

-Ed

^ permalink raw reply

* Re: [PATCH net-next] ixgbe: fix XFRM_ALGO dependency
From: Jeff Kirsher @ 2018-10-17 18:46 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: shannon.nelson, David Miller, Steffen Klassert, Herbert Xu,
	Jesse Brandeburg, Björn Töpel, Alexander Duyck,
	intel-wired-lan, Networking, Linux Kernel Mailing List
In-Reply-To: <CAK8P3a3RiMHVkoA+Wp_abPJ5Fzwk5UdhoOaxt++q4_YnzDRwfA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1552 bytes --]

On Wed, 2018-10-17 at 18:04 +0200, Arnd Bergmann wrote:
> On Wed, Oct 17, 2018 at 5:53 PM Jeff Kirsher
> <jeffrey.t.kirsher@intel.com> wrote:
> > On Tue, 2018-10-16 at 09:35 -0700, Shannon Nelson wrote:
> > > On 10/16/2018 3:03 AM, Arnd Bergmann wrote:
> > > > A separate Kconfig symbol now controls whether we include the
> > > > ipsec
> > > > offload code. To keep the old behavior, this is left as
> > > > 'default
> > > > y'. The
> > > > dependency in XFRM_OFFLOAD still causes a circular dependency
> > > > but
> > > > is
> > > > not actually needed because this symbol is not user visible, so
> > > > removing
> > > > that dependency on top makes it all work.
> > > > 
> > > > Fixes: eda0333ac293 ("ixgbe: add VF IPsec management")
> > > > Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> > 
> > I agree with Shannon's suggested changes.  Arnd, are you working on
> > v2?
> > Or would you like me to take care of it?
> 
> I was planning to respin it, but didn't get around to it yet, and
> will
> be travelling for the next week, so I'd welcome if you can take over
> from here. Shannon's comments all make sense to me as well.

Ok, I will run with it and make a v2 for you.

> 
> > > > +config IXGBE_IPSEC
> > > > +   bool "IPSec XFRM cryptography-offload accelaration"
> > > > +   default n
> > > 
> > > remove this "default n" line?
> 
> I meant for this to say "default y", as I said in the changelog,
> but feel free to pick whichever default makes sense to you
> make make the description match ;-)


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH] isdn: hfc_{pci,sx}: Avoid empty body if statements and use proper register accessors
From: Nathan Chancellor @ 2018-10-17 18:06 UTC (permalink / raw)
  To: Karsten Keil; +Cc: netdev, linux-kernel, Nathan Chancellor

Clang warns:

drivers/isdn/hisax/hfc_pci.c:131:34: error: if statement has empty body
[-Werror,-Wempty-body]
        if (Read_hfc(cs, HFCPCI_INT_S1));
                                        ^
drivers/isdn/hisax/hfc_pci.c:131:34: note: put the semicolon on a
separate line to silence this warning

Use the format found in drivers/isdn/hardware/mISDN/hfcpci.c of casting
the return of Read_hfc to void, instead of using an empty if statement.

While we're at it, Masahiro Yamada pointed out that {Read,Write}_hfc
should be using a standard access method in hfc_pci.h. Use the one found
in drivers/isdn/hardware/mISDN/hfc_pci.h.

Link: https://github.com/ClangBuiltLinux/linux/issues/66
Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
---
 drivers/isdn/hisax/hfc_pci.c | 6 +++---
 drivers/isdn/hisax/hfc_pci.h | 4 ++--
 drivers/isdn/hisax/hfc_sx.c  | 6 +++---
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/isdn/hisax/hfc_pci.c b/drivers/isdn/hisax/hfc_pci.c
index 8e5b03161b2f..a63b9155b697 100644
--- a/drivers/isdn/hisax/hfc_pci.c
+++ b/drivers/isdn/hisax/hfc_pci.c
@@ -128,7 +128,7 @@ reset_hfcpci(struct IsdnCardState *cs)
 	Write_hfc(cs, HFCPCI_INT_M1, cs->hw.hfcpci.int_m1);
 
 	/* Clear already pending ints */
-	if (Read_hfc(cs, HFCPCI_INT_S1));
+	(void) Read_hfc(cs, HFCPCI_INT_S1);
 
 	Write_hfc(cs, HFCPCI_STATES, HFCPCI_LOAD_STATE | 2);	/* HFC ST 2 */
 	udelay(10);
@@ -158,7 +158,7 @@ reset_hfcpci(struct IsdnCardState *cs)
 	/* Finally enable IRQ output */
 	cs->hw.hfcpci.int_m2 = HFCPCI_IRQ_ENABLE;
 	Write_hfc(cs, HFCPCI_INT_M2, cs->hw.hfcpci.int_m2);
-	if (Read_hfc(cs, HFCPCI_INT_S1));
+	(void) Read_hfc(cs, HFCPCI_INT_S1);
 }
 
 /***************************************************/
@@ -1537,7 +1537,7 @@ hfcpci_bh(struct work_struct *work)
 					cs->hw.hfcpci.int_m1 &= ~HFCPCI_INTS_TIMER;
 					Write_hfc(cs, HFCPCI_INT_M1, cs->hw.hfcpci.int_m1);
 					/* Clear already pending ints */
-					if (Read_hfc(cs, HFCPCI_INT_S1));
+					(void) Read_hfc(cs, HFCPCI_INT_S1);
 					Write_hfc(cs, HFCPCI_STATES, 4 | HFCPCI_LOAD_STATE);
 					udelay(10);
 					Write_hfc(cs, HFCPCI_STATES, 4);
diff --git a/drivers/isdn/hisax/hfc_pci.h b/drivers/isdn/hisax/hfc_pci.h
index 4e58700a3e61..4c3b3ba35726 100644
--- a/drivers/isdn/hisax/hfc_pci.h
+++ b/drivers/isdn/hisax/hfc_pci.h
@@ -228,8 +228,8 @@ typedef union {
 } fifo_area;
 
 
-#define Write_hfc(a, b, c) (*(((u_char *)a->hw.hfcpci.pci_io) + b) = c)
-#define Read_hfc(a, b) (*(((u_char *)a->hw.hfcpci.pci_io) + b))
+#define Write_hfc(a, b, c) (writeb(c, (a->hw.hfcpci.pci_io) + b))
+#define Read_hfc(a, b) (readb((a->hw.hfcpci.pci_io) + b))
 
 extern void main_irq_hcpci(struct BCState *bcs);
 extern void releasehfcpci(struct IsdnCardState *cs);
diff --git a/drivers/isdn/hisax/hfc_sx.c b/drivers/isdn/hisax/hfc_sx.c
index 4d3b4b2f2612..c4f3f37adfc8 100644
--- a/drivers/isdn/hisax/hfc_sx.c
+++ b/drivers/isdn/hisax/hfc_sx.c
@@ -381,7 +381,7 @@ reset_hfcsx(struct IsdnCardState *cs)
 	Write_hfc(cs, HFCSX_INT_M1, cs->hw.hfcsx.int_m1);
 
 	/* Clear already pending ints */
-	if (Read_hfc(cs, HFCSX_INT_S1));
+	(void) Read_hfc(cs, HFCSX_INT_S1);
 
 	Write_hfc(cs, HFCSX_STATES, HFCSX_LOAD_STATE | 2);	/* HFC ST 2 */
 	udelay(10);
@@ -411,7 +411,7 @@ reset_hfcsx(struct IsdnCardState *cs)
 	/* Finally enable IRQ output */
 	cs->hw.hfcsx.int_m2 = HFCSX_IRQ_ENABLE;
 	Write_hfc(cs, HFCSX_INT_M2, cs->hw.hfcsx.int_m2);
-	if (Read_hfc(cs, HFCSX_INT_S2));
+	(void) Read_hfc(cs, HFCSX_INT_S2);
 }
 
 /***************************************************/
@@ -1288,7 +1288,7 @@ hfcsx_bh(struct work_struct *work)
 					cs->hw.hfcsx.int_m1 &= ~HFCSX_INTS_TIMER;
 					Write_hfc(cs, HFCSX_INT_M1, cs->hw.hfcsx.int_m1);
 					/* Clear already pending ints */
-					if (Read_hfc(cs, HFCSX_INT_S1));
+					(void) Read_hfc(cs, HFCSX_INT_S1);
 
 					Write_hfc(cs, HFCSX_STATES, 4 | HFCSX_LOAD_STATE);
 					udelay(10);
-- 
2.19.1

^ permalink raw reply related

* [PATCH] atm: zatm: Fix empty body Clang warnings
From: Nathan Chancellor @ 2018-10-17 18:04 UTC (permalink / raw)
  To: Chas Williams; +Cc: linux-atm-general, netdev, linux-kernel, Nathan Chancellor

Clang warns:

drivers/atm/zatm.c:513:7: error: while loop has empty body
[-Werror,-Wempty-body]
        zwait;
             ^
drivers/atm/zatm.c:513:7: note: put the semicolon on a separate line to
silence this warning

Get rid of this warning by using an empty do-while loop. While we're at
it, add parentheses to make it clear that this is a function-like macro.

Link: https://github.com/ClangBuiltLinux/linux/issues/42
Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
---
 drivers/atm/zatm.c | 42 +++++++++++++++++++++---------------------
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/drivers/atm/zatm.c b/drivers/atm/zatm.c
index e89146ddede6..d5c76b50d357 100644
--- a/drivers/atm/zatm.c
+++ b/drivers/atm/zatm.c
@@ -126,7 +126,7 @@ static unsigned long dummy[2] = {0,0};
 #define zin_n(r) inl(zatm_dev->base+r*4)
 #define zin(r) inl(zatm_dev->base+uPD98401_##r*4)
 #define zout(v,r) outl(v,zatm_dev->base+uPD98401_##r*4)
-#define zwait while (zin(CMR) & uPD98401_BUSY)
+#define zwait() do {} while (zin(CMR) & uPD98401_BUSY)
 
 /* RX0, RX1, TX0, TX1 */
 static const int mbx_entries[NR_MBX] = { 1024,1024,1024,1024 };
@@ -140,7 +140,7 @@ static const int mbx_esize[NR_MBX] = { 16,16,4,4 }; /* entry size in bytes */
 
 static void zpokel(struct zatm_dev *zatm_dev,u32 value,u32 addr)
 {
-	zwait;
+	zwait();
 	zout(value,CER);
 	zout(uPD98401_IND_ACC | uPD98401_IA_BALL |
 	    (uPD98401_IA_TGT_CM << uPD98401_IA_TGT_SHIFT) | addr,CMR);
@@ -149,10 +149,10 @@ static void zpokel(struct zatm_dev *zatm_dev,u32 value,u32 addr)
 
 static u32 zpeekl(struct zatm_dev *zatm_dev,u32 addr)
 {
-	zwait;
+	zwait();
 	zout(uPD98401_IND_ACC | uPD98401_IA_BALL | uPD98401_IA_RW |
 	  (uPD98401_IA_TGT_CM << uPD98401_IA_TGT_SHIFT) | addr,CMR);
-	zwait;
+	zwait();
 	return zin(CER);
 }
 
@@ -241,7 +241,7 @@ static void refill_pool(struct atm_dev *dev,int pool)
 	}
 	if (first) {
 		spin_lock_irqsave(&zatm_dev->lock, flags);
-		zwait;
+		zwait();
 		zout(virt_to_bus(first),CER);
 		zout(uPD98401_ADD_BAT | (pool << uPD98401_POOL_SHIFT) | count,
 		    CMR);
@@ -508,9 +508,9 @@ static int open_rx_first(struct atm_vcc *vcc)
 	}
 	if (zatm_vcc->pool < 0) return -EMSGSIZE;
 	spin_lock_irqsave(&zatm_dev->lock, flags);
-	zwait;
+	zwait();
 	zout(uPD98401_OPEN_CHAN,CMR);
-	zwait;
+	zwait();
 	DPRINTK("0x%x 0x%x\n",zin(CMR),zin(CER));
 	chan = (zin(CMR) & uPD98401_CHAN_ADDR) >> uPD98401_CHAN_ADDR_SHIFT;
 	spin_unlock_irqrestore(&zatm_dev->lock, flags);
@@ -571,21 +571,21 @@ static void close_rx(struct atm_vcc *vcc)
 		pos = vcc->vci >> 1;
 		shift = (1-(vcc->vci & 1)) << 4;
 		zpokel(zatm_dev,zpeekl(zatm_dev,pos) & ~(0xffff << shift),pos);
-		zwait;
+		zwait();
 		zout(uPD98401_NOP,CMR);
-		zwait;
+		zwait();
 		zout(uPD98401_NOP,CMR);
 		spin_unlock_irqrestore(&zatm_dev->lock, flags);
 	}
 	spin_lock_irqsave(&zatm_dev->lock, flags);
-	zwait;
+	zwait();
 	zout(uPD98401_DEACT_CHAN | uPD98401_CHAN_RT | (zatm_vcc->rx_chan <<
 	    uPD98401_CHAN_ADDR_SHIFT),CMR);
-	zwait;
+	zwait();
 	udelay(10); /* why oh why ... ? */
 	zout(uPD98401_CLOSE_CHAN | uPD98401_CHAN_RT | (zatm_vcc->rx_chan <<
 	    uPD98401_CHAN_ADDR_SHIFT),CMR);
-	zwait;
+	zwait();
 	if (!(zin(CMR) & uPD98401_CHAN_ADDR))
 		printk(KERN_CRIT DEV_LABEL "(itf %d): can't close RX channel "
 		    "%d\n",vcc->dev->number,zatm_vcc->rx_chan);
@@ -699,7 +699,7 @@ printk("NONONONOO!!!!\n");
 	skb_queue_tail(&zatm_vcc->tx_queue,skb);
 	DPRINTK("QRP=0x%08lx\n",zpeekl(zatm_dev,zatm_vcc->tx_chan*VC_SIZE/4+
 	  uPD98401_TXVC_QRP));
-	zwait;
+	zwait();
 	zout(uPD98401_TX_READY | (zatm_vcc->tx_chan <<
 	    uPD98401_CHAN_ADDR_SHIFT),CMR);
 	spin_unlock_irqrestore(&zatm_dev->lock, flags);
@@ -891,12 +891,12 @@ static void close_tx(struct atm_vcc *vcc)
 	}
 	spin_lock_irqsave(&zatm_dev->lock, flags);
 #if 0
-	zwait;
+	zwait();
 	zout(uPD98401_DEACT_CHAN | (chan << uPD98401_CHAN_ADDR_SHIFT),CMR);
 #endif
-	zwait;
+	zwait();
 	zout(uPD98401_CLOSE_CHAN | (chan << uPD98401_CHAN_ADDR_SHIFT),CMR);
-	zwait;
+	zwait();
 	if (!(zin(CMR) & uPD98401_CHAN_ADDR))
 		printk(KERN_CRIT DEV_LABEL "(itf %d): can't close TX channel "
 		    "%d\n",vcc->dev->number,chan);
@@ -926,9 +926,9 @@ static int open_tx_first(struct atm_vcc *vcc)
 	zatm_vcc->tx_chan = 0;
 	if (vcc->qos.txtp.traffic_class == ATM_NONE) return 0;
 	spin_lock_irqsave(&zatm_dev->lock, flags);
-	zwait;
+	zwait();
 	zout(uPD98401_OPEN_CHAN,CMR);
-	zwait;
+	zwait();
 	DPRINTK("0x%x 0x%x\n",zin(CMR),zin(CER));
 	chan = (zin(CMR) & uPD98401_CHAN_ADDR) >> uPD98401_CHAN_ADDR_SHIFT;
 	spin_unlock_irqrestore(&zatm_dev->lock, flags);
@@ -1557,7 +1557,7 @@ static void zatm_phy_put(struct atm_dev *dev,unsigned char value,
 	struct zatm_dev *zatm_dev;
 
 	zatm_dev = ZATM_DEV(dev);
-	zwait;
+	zwait();
 	zout(value,CER);
 	zout(uPD98401_IND_ACC | uPD98401_IA_B0 |
 	    (uPD98401_IA_TGT_PHY << uPD98401_IA_TGT_SHIFT) | addr,CMR);
@@ -1569,10 +1569,10 @@ static unsigned char zatm_phy_get(struct atm_dev *dev,unsigned long addr)
 	struct zatm_dev *zatm_dev;
 
 	zatm_dev = ZATM_DEV(dev);
-	zwait;
+	zwait();
 	zout(uPD98401_IND_ACC | uPD98401_IA_B0 | uPD98401_IA_RW |
 	  (uPD98401_IA_TGT_PHY << uPD98401_IA_TGT_SHIFT) | addr,CMR);
-	zwait;
+	zwait();
 	return zin(CER) & 0xff;
 }
 
-- 
2.19.1

^ permalink raw reply related

* [PATCH] atm: eni: Move semicolon to a new line after empty for loop
From: Nathan Chancellor @ 2018-10-17 18:03 UTC (permalink / raw)
  To: Chas Williams; +Cc: linux-atm-general, netdev, linux-kernel, Nathan Chancellor

Clang warns:

drivers/atm/eni.c:244:48: error: for loop has empty body
[-Werror,-Wempty-body]
        for (order = 0; (1 << order) < *size; order++);
                                                      ^
drivers/atm/eni.c:244:48: note: put the semicolon on a separate line to
silence this warning

In this case, that loop is expected to be empty so silence the warning
in the way that Clang suggests.

Link: https://github.com/ClangBuiltLinux/linux/issues/42
Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
---
 drivers/atm/eni.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/atm/eni.c b/drivers/atm/eni.c
index 6470e3c4c990..f8c703426c90 100644
--- a/drivers/atm/eni.c
+++ b/drivers/atm/eni.c
@@ -241,7 +241,8 @@ static void __iomem *eni_alloc_mem(struct eni_dev *eni_dev, unsigned long *size)
 	len = eni_dev->free_len;
 	if (*size < MID_MIN_BUF_SIZE) *size = MID_MIN_BUF_SIZE;
 	if (*size > MID_MAX_BUF_SIZE) return NULL;
-	for (order = 0; (1 << order) < *size; order++);
+	for (order = 0; (1 << order) < *size; order++)
+		;
 	DPRINTK("trying: %ld->%d\n",*size,order);
 	best_order = 65; /* we don't have more than 2^64 of anything ... */
 	index = 0; /* silence GCC */
-- 
2.19.1

^ permalink raw reply related

* [PATCH V2 net-next] net: ena: Fix Kconfig dependency on X86
From: netanel @ 2018-10-17 10:04 UTC (permalink / raw)
  To: davem, netdev
  Cc: akiyano, alisaidi, Netanel Belgazal, dwmw, zorik, matua, saeedb,
	msw, aliguori, nafea, gtzalik

From: Netanel Belgazal <netanel@amazon.com>

The Kconfig limitation of X86 is to too wide.
The ENA driver only requires a little endian dependency.

Change the dependency to be on little endian CPU.

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
---
 drivers/net/ethernet/amazon/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/amazon/Kconfig b/drivers/net/ethernet/amazon/Kconfig
index 99b30353541a..9e87d7b8360f 100644
--- a/drivers/net/ethernet/amazon/Kconfig
+++ b/drivers/net/ethernet/amazon/Kconfig
@@ -17,7 +17,7 @@ if NET_VENDOR_AMAZON
 
 config ENA_ETHERNET
 	tristate "Elastic Network Adapter (ENA) support"
-	depends on (PCI_MSI && X86)
+	depends on PCI_MSI && !CPU_BIG_ENDIAN
 	---help---
 	  This driver supports Elastic Network Adapter (ENA)"
 
-- 
2.15.2.AMZN

^ permalink raw reply related

* Re: [PATCH net-next] net: ena: Fix Kconfig dependencies X86
From: Belgazal, Netanel @ 2018-10-17 10:01 UTC (permalink / raw)
  To: Sergei Shtylyov, davem@davemloft.net, netdev@vger.kernel.org
  Cc: Kiyanovski, Arthur, Saidi, Ali, Woodhouse, David,
	Machulsky, Zorik, Matushevsky, Alexander, Bshara, Saeed,
	Wilson, Matt, Liguori, Anthony, Bshara, Nafea, Tzalik, Guy
In-Reply-To: <155c40da-faf6-ad9d-4d0b-f64d67217160@cogentembedded.com>

Sure.
Removing them and resubmit.

On 10/17/18, 11:37 AM, "Sergei Shtylyov" <sergei.shtylyov@cogentembedded.com> wrote:

    Hello!
    
    On 17.10.2018 11:16, netanel@amazon.com wrote:
    
    > From: Netanel Belgazal <netanel@amazon.com>
    >
    > The Kconfig limitation of X86 is to too wide.
    > The ENA driver only requires a little endian dependency.
    >
    > Change the dependency to be on little endian CPU.
    >
    > Signed-off-by: Netanel Belgazal <netanel@amazon.com>
    > ---
    >  drivers/net/ethernet/amazon/Kconfig | 2 +-
    >  1 file changed, 1 insertion(+), 1 deletion(-)
    >
    > diff --git a/drivers/net/ethernet/amazon/Kconfig b/drivers/net/ethernet/amazon/Kconfig
    > index 99b30353541a..f4d16c7e104f 100644
    > --- a/drivers/net/ethernet/amazon/Kconfig
    > +++ b/drivers/net/ethernet/amazon/Kconfig
    > @@ -17,7 +17,7 @@ if NET_VENDOR_AMAZON
    >
    >  config ENA_ETHERNET
    >  	tristate "Elastic Network Adapter (ENA) support"
    > -	depends on (PCI_MSI && X86)
    > +	depends on (PCI_MSI && !CPU_BIG_ENDIAN)
    
         Parens not needed here. High time to remove them, I think.
    
    [...]
    
    MBR, Sergei
    
    
    


^ permalink raw reply

* Re: [RFC] VSOCK: The performance problem of vhost_vsock.
From: Jason Wang @ 2018-10-17  9:51 UTC (permalink / raw)
  To: jiangyiwen, stefanha; +Cc: netdev, kvm, virtualization
In-Reply-To: <d16d2052-bfb1-7861-e210-b53b4ea3260c@redhat.com>


On 2018/10/17 下午5:39, Jason Wang wrote:
>>>
>> Hi Jason and Stefan,
>>
>> Maybe I find the reason of bad performance.
>>
>> I found pkt_len is limited to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE(4K),
>> it will cause the bandwidth is limited to 500~600MB/s. And once I
>> increase to 64k, it can improve about 3 times(~1500MB/s).
>
>
> Looks like the value was chosen for a balance between rx buffer size 
> and performance. Allocating 64K always even for small packet is kind 
> of waste and stress for guest memory. Virito-net try to avoid this by 
> inventing the merge able rx buffer which allows big packet to be 
> scattered in into different buffers. We can reuse this idea or revisit 
> the idea of using virtio-net/vhost-net as a transport of vsock.
>
> What interesting is the performance is still behind vhost-net.
>
> Thanks
>
>>
>> By the way, I send to 64K in application once, and I don't use
>> sg_init_one and rewrite function to packet sg list because pkt_len
>> include multiple pages.
>>
>> Thanks,
>> Yiwen. 


Btw, if you're using vsock for transferring large files, maybe it's more 
efficient to implement sendpage() for vsock to allow sendfile()/splice() 
work.

Thanks

^ permalink raw reply

* [PATCH net] udp6: fix encap return code for resubmitting
From: Paolo Abeni @ 2018-10-17  9:44 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller

The commit eb63f2964dbe ("udp6: add missing checks on edumux packet
processing") used the same return code convention of the ipv4 counterpart,
but ipv6 uses the opposite one: positive values means resubmit.

This change addresses the issue, using positive return value for
resubmitting. Also update the related comment, which was broken, too.

Fixes: eb63f2964dbe ("udp6: add missing checks on edumux packet processing")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
Note: I could not find any in kernel udp6 encap using the above
feature, that would explain why nobody complained so far...
---
 net/ipv6/udp.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 28c4aa5078fc..b36694b6716e 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -766,11 +766,9 @@ static int udp6_unicast_rcv_skb(struct sock *sk, struct sk_buff *skb,
 
 	ret = udpv6_queue_rcv_skb(sk, skb);
 
-	/* a return value > 0 means to resubmit the input, but
-	 * it wants the return to be -protocol, or 0
-	 */
+	/* a return value > 0 means to resubmit the input */
 	if (ret > 0)
-		return -ret;
+		return ret;
 	return 0;
 }
 
-- 
2.17.2

^ permalink raw reply related

* Details.
From: Smadar Barber-Tsadik @ 2018-10-17  7:00 UTC (permalink / raw)
  To: Recipients

My name is Smadar Barber-Tsadik, 
I'm the Chief Executive Officer (C.P.A) of the First International Bank of Israel (FIBI).
I'm getting in touch with you in regards to a very important and urgent matter.
Kindly respond back at your earliest convenience so I can provide you the details.

Faithfully,
Smadar Barber-Tsadik

^ permalink raw reply

* Re: [RFC] VSOCK: The performance problem of vhost_vsock.
From: Jason Wang @ 2018-10-17  9:39 UTC (permalink / raw)
  To: jiangyiwen, stefanha; +Cc: netdev, kvm, virtualization
In-Reply-To: <5BC70069.4000600@huawei.com>


On 2018/10/17 下午5:27, jiangyiwen wrote:
> On 2018/10/15 14:12, jiangyiwen wrote:
>> On 2018/10/15 10:33, Jason Wang wrote:
>>>
>>> On 2018年10月15日 09:43, jiangyiwen wrote:
>>>> Hi Stefan & All:
>>>>
>>>> Now I find vhost-vsock has two performance problems even if it
>>>> is not designed for performance.
>>>>
>>>> First, I think vhost-vsock should faster than vhost-net because it
>>>> is no TCP/IP stack, but the real test result vhost-net is 5~10
>>>> times than vhost-vsock, currently I am looking for the reason.
>>> TCP/IP is not a must for vhost-net.
>>>
>>> How do you test and compare the performance?
>>>
>>> Thanks
>>>
>> I test the performance used my test tool, like follows:
>>
>> Server                   Client
>> socket()
>> bind()
>> listen()
>>
>>                           socket(AF_VSOCK) or socket(AF_INET)
>> Accept() <-------------->connect()
>>                           *======Start Record Time======*
>>                           Call syscall sendfile()
>> Recv()
>>                           Send end
>> Receive end
>> Send(file_size)
>>                           Recv(file_size)
>>                           *======End Record Time======*
>>
>> The test result, vhost-vsock is about 500MB/s, and vhost-net is about 2500MB/s.
>>
>> By the way, vhost-net use single queue.
>>
>> Thanks.
>>
>>>> Second, vhost-vsock only supports two vqs(tx and rx), that means
>>>> if multiple sockets in the guest will use the same vq to transmit
>>>> the message and get the response. So if there are multiple applications
>>>> in the guest, we should support "Multiqueue" feature for Virtio-vsock.
>>>>
>>>> Stefan, have you encountered these problems?
>>>>
>>>> Thanks,
>>>> Yiwen.
>>>>
>>>
>>> .
>>>
>>
> Hi Jason and Stefan,
>
> Maybe I find the reason of bad performance.
>
> I found pkt_len is limited to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE(4K),
> it will cause the bandwidth is limited to 500~600MB/s. And once I
> increase to 64k, it can improve about 3 times(~1500MB/s).


Looks like the value was chosen for a balance between rx buffer size and 
performance. Allocating 64K always even for small packet is kind of 
waste and stress for guest memory. Virito-net try to avoid this by 
inventing the merge able rx buffer which allows big packet to be 
scattered in into different buffers. We can reuse this idea or revisit 
the idea of using virtio-net/vhost-net as a transport of vsock.

What interesting is the performance is still behind vhost-net.

Thanks

>
> By the way, I send to 64K in application once, and I don't use
> sg_init_one and rewrite function to packet sg list because pkt_len
> include multiple pages.
>
> Thanks,
> Yiwen.
>
>> _______________________________________________
>> Virtualization mailing list
>> Virtualization@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
>>
>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox