Linux virtualization list
 help / color / mirror / Atom feed
* [PATCH 0/2] Fix paravirt fail
From: Peter Zijlstra @ 2016-12-08 15:42 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Jeremy Fitzhardinge, Peter Zijlstra (Intel), Pan Xinhui,
	Alok Kataria, kernel test robot, virtualization, Chris Wright,
	Borislav Petkov, Peter Anvin, Paolo Bonzini, Thomas Gleixner,
	Ingo Molnar

Two patches that cure fallout from commit:

  3cded4179481 ("x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted()")

^ permalink raw reply

* [PATCH 1/2] x86,paravirt: Fix native_patch()
From: Peter Zijlstra @ 2016-12-08 15:42 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Jeremy Fitzhardinge, Peter Zijlstra (Intel), Pan Xinhui,
	Alok Kataria, kernel test robot, virtualization, Chris Wright,
	Borislav Petkov, Peter Anvin, Paolo Bonzini, Thomas Gleixner,
	Ingo Molnar
In-Reply-To: <20161208154213.952687487@infradead.org>

[-- Attachment #1: peter_zijlstra-fd6f48529f-_aim7_jobs-per-min_-26_1__regression.patch --]
[-- Type: text/plain, Size: 1713 bytes --]

While chasing a regression I noticed we potentially patch the wrong
code in native_patch().

If we do not select the native code sequence, we must use the default
patcher, not fall-through the switch case.

Fixes: 3cded4179481 ("x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted()")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/paravirt_patch_32.c |    4 ++++
 arch/x86/kernel/paravirt_patch_64.c |    4 ++++
 2 files changed, 8 insertions(+)

--- a/arch/x86/kernel/paravirt_patch_32.c
+++ b/arch/x86/kernel/paravirt_patch_32.c
@@ -56,15 +56,19 @@ unsigned native_patch(u8 type, u16 clobb
 				end   = end_pv_lock_ops_queued_spin_unlock;
 				goto patch_site;
 			}
+			goto patch_default;
+
 		case PARAVIRT_PATCH(pv_lock_ops.vcpu_is_preempted):
 			if (pv_is_native_vcpu_is_preempted()) {
 				start = start_pv_lock_ops_vcpu_is_preempted;
 				end   = end_pv_lock_ops_vcpu_is_preempted;
 				goto patch_site;
 			}
+			goto patch_default;
 #endif
 
 	default:
+patch_default:
 		ret = paravirt_patch_default(type, clobbers, ibuf, addr, len);
 		break;
 
--- a/arch/x86/kernel/paravirt_patch_64.c
+++ b/arch/x86/kernel/paravirt_patch_64.c
@@ -68,15 +68,19 @@ unsigned native_patch(u8 type, u16 clobb
 				end   = end_pv_lock_ops_queued_spin_unlock;
 				goto patch_site;
 			}
+			goto patch_default;
+
 		case PARAVIRT_PATCH(pv_lock_ops.vcpu_is_preempted):
 			if (pv_is_native_vcpu_is_preempted()) {
 				start = start_pv_lock_ops_vcpu_is_preempted;
 				end   = end_pv_lock_ops_vcpu_is_preempted;
 				goto patch_site;
 			}
+			goto patch_default;
 #endif
 
 	default:
+patch_default:
 		ret = paravirt_patch_default(type, clobbers, ibuf, addr, len);
 		break;

^ permalink raw reply

* [PATCH 2/2] x86, paravirt: Fix bool return type for PVOP_CALL
From: Peter Zijlstra @ 2016-12-08 15:42 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Jeremy Fitzhardinge, Peter Zijlstra (Intel), Pan Xinhui,
	Alok Kataria, kernel test robot, virtualization, Chris Wright,
	Borislav Petkov, Peter Anvin, Paolo Bonzini, Thomas Gleixner,
	Ingo Molnar
In-Reply-To: <20161208154213.952687487@infradead.org>

[-- Attachment #1: peterz-fix-paravirt-bool.patch --]
[-- Type: text/plain, Size: 2945 bytes --]

Commit 3cded4179481 ("x86/paravirt: Optimize native
pv_lock_ops.vcpu_is_preempted()") introduced a paravirt op with bool
return type [*]

It turns out that the PVOP_CALL*() macros miscompile when rettype is
bool. Code that looked like:

   83 ef 01                sub    $0x1,%edi
   ff 15 32 a0 d8 00       callq  *0xd8a032(%rip)        # ffffffff81e28120 <pv_lock_ops+0x20>
   84 c0                   test   %al,%al

ended up looking like so after PVOP_CALL1() was applied:

   83 ef 01                sub    $0x1,%edi
   48 63 ff                movslq %edi,%rdi
   ff 14 25 20 81 e2 81    callq  *0xffffffff81e28120
   48 85 c0                test   %rax,%rax

Note how it tests the whole of %rax, even though a typical bool return
function only sets %al, like:

  0f 95 c0                setne  %al
  c3                      retq

This is because ____PVOP_CALL() does:

		__ret = (rettype)__eax;

and while regular integer type casts truncate the result, a cast to
bool tests for any !0 value. Fix this by explicitly truncating to
sizeof(rettype) before casting.


[*] The actual bug should've been exposed in commit 446f3dc8cc0a
("locking/core, x86/paravirt: Implement vcpu_is_preempted(cpu) for KVM
and Xen guests") but that didn't properly implement the paravirt call.

Cc: Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: virtualization@lists.linux-foundation.org
Cc: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Fixes: 3cded4179481 ("x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted()")
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/paravirt_types.h |   14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -508,6 +508,18 @@ int paravirt_disable_iospace(void);
 #define PVOP_TEST_NULL(op)	((void)op)
 #endif
 
+#define PVOP_RETMASK(rettype)						\
+	({	unsigned long __mask = ~0UL;				\
+		switch (sizeof(rettype)) {				\
+		case 1: __mask =       0xffUL; break;			\
+		case 2: __mask =     0xffffUL; break;			\
+		case 4: __mask = 0xffffffffUL; break;			\
+		default: break;						\
+		}							\
+		__mask;							\
+	})
+
+
 #define ____PVOP_CALL(rettype, op, clbr, call_clbr, extra_clbr,		\
 		      pre, post, ...)					\
 	({								\
@@ -535,7 +547,7 @@ int paravirt_disable_iospace(void);
 				       paravirt_clobber(clbr),		\
 				       ##__VA_ARGS__			\
 				     : "memory", "cc" extra_clbr);	\
-			__ret = (rettype)__eax;				\
+			__ret = (rettype)(__eax & PVOP_RETMASK(rettype));	\
 		}							\
 		__ret;							\
 	})

^ permalink raw reply

* Re: [PATCH] linux/types.h: enable endian checks for all sparse builds
From: Michael S. Tsirkin @ 2016-12-08 16:17 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: kvm@vger.kernel.org, Neil Armstrong, David Airlie,
	linux-remoteproc@vger.kernel.org, dri-devel@lists.freedesktop.org,
	virtualization@lists.linux-foundation.org,
	linux-s390@vger.kernel.org, James E.J. Bottomley, Herbert Xu,
	linux-scsi@vger.kernel.org, Christoph Hellwig,
	v9fs-developer@lists.sourceforge.net, Asias He, Arnd Bergmann,
	linux-kbuild@vger.kernel.org, Jens Axboe, Michal Marek,
	Stefan Hajnoczi <stef>
In-Reply-To: <BLUPR02MB168304F8FBA50C916A6A4E6081840@BLUPR02MB1683.namprd02.prod.outlook.com>

On Thu, Dec 08, 2016 at 06:38:11AM +0000, Bart Van Assche wrote:
> On 12/07/16 21:54, Michael S. Tsirkin wrote:
> > On Thu, Dec 08, 2016 at 05:21:47AM +0000, Bart Van Assche wrote:
> >> Additionally, there are notable exceptions to the rule that most drivers
> >> are endian-clean, e.g. drivers/scsi/qla2xxx. I would appreciate it if it
> >> would remain possible to check such drivers with sparse without enabling
> >> endianness checks. Have you considered to change #ifdef __CHECK_ENDIAN__
> >> into e.g. #ifndef __DONT_CHECK_ENDIAN__?
> >
> > The right thing is probably just to fix these, isn't it?
> > Until then, why not just ignore the warnings?
> 
> Neither option is realistic. With endian-checking enabled the qla2xxx 
> driver triggers so many warnings that it becomes a real challenge to 
> filter the non-endian warnings out manually:
> 
> $ for f in "" CF=-D__CHECK_ENDIAN__; do make M=drivers/scsi/qla2xxx C=2\
>      $f | &grep -c ': warning:'; done
> 4
> 752

You can always revert this patch in your tree, or whatever.  It does not
look like this will get fixed otherwise.

> If you think it would be easy to fix the endian warnings triggered by 
> the qla2xxx driver, you are welcome to try to fix these.
> 
> Bart.

Yea, this hardware was designed by someone who thought mixing
LE and BE all over the place is a good idea.
But who said it should be easy?

Maybe this change will be enough to motivate the maintainers.

Here's a minor buglet for you as a motivator:

                        if (ct_rsp->header.response !=
                            cpu_to_be16(CT_ACCEPT_RESPONSE)) {
                                ql_dbg(ql_dbg_disc + ql_dbg_buffer, vha, 0x2077,
                                    "%s failed rejected request on port_id: %02x%02x%02x Compeltion status 0x%x, response 0x%x\n",
                                    routine, vha->d_id.b.domain,
                                    vha->d_id.b.area, vha->d_id.b.al_pa, comp_status, ct_rsp->header.response);


response is BE and isn't printed correctly.

another:

        eiter->a.max_frame_size = cpu_to_be32(eiter->a.max_frame_size);
        size += 4 + 4;

        ql_dbg(ql_dbg_disc, vha, 0x20bc,
            "Max_Frame_Size = %x.\n", eiter->a.max_frame_size);

printed too late, it's be by that time.

Here's another suspicious line

        ctio24->u.status1.flags = (atio->u.isp24.attr << 9) |
            cpu_to_le16(CTIO7_FLAGS_STATUS_MODE_1 |
                CTIO7_FLAGS_TERMINATE);

shifting attr by 9 bits gives different results on BE and LE,
mixing it with le16 looks rather strange.

Another:

                ha->flags.dport_enabled =
                    (mid_init_cb->init_cb.firmware_options_1 & BIT_7) != 0;

BIT_7 is native endian, firmware_options_1 is LE I think.



Look at qla27xx_find_valid_image as well.

        if (pri_image_status.signature != QLA27XX_IMG_STATUS_SIGN)

qla27xx_image_status seems to be data coming from flash, but is
somehow native-endian? Maybe ...


        lun = a->u.isp24.fcp_cmnd.lun;

I think lun here is in hardware format (le?), code treats it
as native.


Not to speak about interface abuse all over the place.
How about this:

uint32_t *
qla24xx_read_flash_data(scsi_qla_host_t *vha, uint32_t *dwptr, uint32_t
faddr,
    uint32_t dwords)                     
{
        uint32_t i;                     
        struct qla_hw_data *ha = vha->hw;
                                        
        /* Dword reads to flash. */
        for (i = 0; i < dwords; i++, faddr++)
                dwptr[i] = cpu_to_le32(qla24xx_read_flash_dword(ha,
                    flash_data_addr(ha, faddr)));

        return dwptr;                   
}

OK so we convert to LE ...

                qla24xx_read_flash_data(vha, dcode, faddr, 4); 
    
                risc_addr = be32_to_cpu(dcode[2]);
                *srisc_addr = *srisc_addr == 0 ? risc_addr : *srisc_addr;
                risc_size = be32_to_cpu(dcode[3]);

then happily assume it's BE.

And again, coming from flash, it's unlikely to actually be in the native
endian-ness as callers seem to assume. I'm guessing it's all BE.

I poked at it a bit and was able to cut down # of warnings
from 1700 to 1400 in an hour. Someone familiar with the code
should look at it.

-- 
MST

^ permalink raw reply

* Re: [PATCH 1/2] vhost-vsock: remove unused vq variable
From: David Miller @ 2016-12-08 16:30 UTC (permalink / raw)
  To: bergwolf; +Cc: netdev, kvm, stefanha, virtualization
In-Reply-To: <1481104340-77035-1-git-send-email-bergwolf@gmail.com>

From: Peng Tao <bergwolf@gmail.com>
Date: Wed,  7 Dec 2016 17:52:18 +0800

> Signed-off-by: Peng Tao <bergwolf@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH 2/2] vhost: remove unnecessary smp_mb from vhost_work_queue
From: David Miller @ 2016-12-08 16:30 UTC (permalink / raw)
  To: bergwolf; +Cc: netdev, kvm, stefanha, virtualization
In-Reply-To: <1481104340-77035-2-git-send-email-bergwolf@gmail.com>

From: Peng Tao <bergwolf@gmail.com>
Date: Wed,  7 Dec 2016 17:52:19 +0800

> test_and_set_bit() already implies a memory barrier.
> 
> Signed-off-by: Peng Tao <bergwolf@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH 2/2] x86, paravirt: Fix bool return type for PVOP_CALL
From: Pan Xinhui @ 2016-12-08 16:40 UTC (permalink / raw)
  To: Peter Zijlstra, linux-kernel, x86
  Cc: Jeremy Fitzhardinge, Pan Xinhui, Alok Kataria, kernel test robot,
	virtualization, Chris Wright, Borislav Petkov, Peter Anvin,
	Paolo Bonzini, Thomas Gleixner, Ingo Molnar
In-Reply-To: <20161208154349.346057680@infradead.org>


hi, Peter
	I think I know the point.

then could we just let __eax rettype(here is bool), not unsigned long?
I does not do tests for my thoughts.

@@ -461,7 +461,9 @@ int paravirt_disable_iospace(void);
  #define PVOP_VCALL_ARGS                                                        \
         unsigned long __eax = __eax, __edx = __edx, __ecx = __ecx;      \
         register void *__sp asm("esp")
-#define PVOP_CALL_ARGS                 PVOP_VCALL_ARGS
+#define PVOP_CALL_ARGS 					\
+       rettype __eax = __eax, __edx = __edx, __ecx = __ecx;    \
+       register void *__sp asm("esp")

^ permalink raw reply

* Re: [PATCH 1/2] virtio_ring: Do not call dma_map_page if sg is already mapped.
From: Michael S. Tsirkin @ 2016-12-08 16:46 UTC (permalink / raw)
  To: Wendy Liang
  Cc: linux-remoteproc, Wendy Liang, bjorn.andersson, virtualization
In-Reply-To: <1481180353-11139-2-git-send-email-jliang@xilinx.com>

On Wed, Dec 07, 2016 at 10:59:12PM -0800, Wendy Liang wrote:
> If sg is already dma mapped, do not call dma_map_page() in
> vring_map_one_sg().
> 
> In case of rpmsg, rpmsg uses dma_alloc_coherent() to allocate
> memory to share with the remote. There is no pages setup
> in dma_alloc_coherent().
> 
> In this case, we cannot convert the virtual address back to the
> physical address. In this case, we can setup the sg_dma_addr to
> store the DMA address, and also mark the sg is already mapped.
> 
> In the vring, we can detect if the address is already mapped
> by checking the sg_dma_addr. If yes, do not call dma_map_page().
> 
> Signed-off-by: Wendy Liang <jliang@xilinx.com>
> ---
>  drivers/virtio/virtio_ring.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 489bfc6..9793e1f 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -180,6 +180,12 @@ static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
>  	if (!vring_use_dma_api(vq->vq.vdev))
>  		return (dma_addr_t)sg_phys(sg);
>  
> +	/* If the sg is already mapped, return the DMA address */

How come we even reach this code for rpmsg?
Does vring_use_dma_api return true for rpmsg?

> +	if (sg_dma_address(sg)) {
> +		sg->length = sg_dma_len(sg);
> +		return sg_dma_address(sg);
> +	}
> +

Is there a rule that says 0 is not a valid address?

>  	/*
>  	 * We can't use dma_map_sg, because we don't use scatterlists in
>  	 * the way it expects (we don't guarantee that the scatterlist
> -- 
> 1.9.1

^ permalink raw reply

* Re: [PATCH 2/2] x86, paravirt: Fix bool return type for PVOP_CALL
From: Peter Zijlstra @ 2016-12-08 16:50 UTC (permalink / raw)
  To: Pan Xinhui
  Cc: Jeremy Fitzhardinge, Pan Xinhui, x86, Alok Kataria, linux-kernel,
	virtualization, Chris Wright, kernel test robot, Borislav Petkov,
	Peter Anvin, Paolo Bonzini, Thomas Gleixner, Ingo Molnar
In-Reply-To: <474f05c8-8bca-f68f-6499-ce7b424ff984@linux.vnet.ibm.com>

On Fri, Dec 09, 2016 at 12:40:35AM +0800, Pan Xinhui wrote:
> 
> hi, Peter
> 	I think I know the point.
> 
> then could we just let __eax rettype(here is bool), not unsigned long?
> I does not do tests for my thoughts.
> 
> @@ -461,7 +461,9 @@ int paravirt_disable_iospace(void);
>  #define PVOP_VCALL_ARGS                                                        \
>         unsigned long __eax = __eax, __edx = __edx, __ecx = __ecx;      \
>         register void *__sp asm("esp")
> -#define PVOP_CALL_ARGS                 PVOP_VCALL_ARGS
> +#define PVOP_CALL_ARGS 					\
> +       rettype __eax = __eax, __edx = __edx, __ecx = __ecx;    \
> +       register void *__sp asm("esp")

Doesn't work on i386 where eax is also an argument register.

^ permalink raw reply

* [PATCH-RESEND] vhost-vsock: fix orphan connection reset
From: Peng Tao @ 2016-12-08 17:10 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: netdev, David Miller, kvm, virtualization

local_addr.svm_cid is host cid. We should check guest cid instead,
which is remote_addr.svm_cid. Otherwise we end up resetting all
connections to all guests.

Cc: stable@vger.kernel.org [4.8+]
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Peng Tao <bergwolf@gmail.com>
---
resending because the last attempt looks to be dropped by vger.
 drivers/vhost/vsock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index e3b30ea..a504e2e0 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -506,7 +506,7 @@ static void vhost_vsock_reset_orphans(struct sock *sk)
 	 * executing.
 	 */
 
-	if (!vhost_vsock_get(vsk->local_addr.svm_cid)) {
+	if (!vhost_vsock_get(vsk->remote_addr.svm_cid)) {
 		sock_set_flag(sk, SOCK_DONE);
 		vsk->peer_shutdown = SHUTDOWN_MASK;
 		sk->sk_state = SS_UNCONNECTED;
-- 
2.7.4

^ permalink raw reply related

* [PATCH v3 0/4] vsock: cancel connect packets when failing to connect
From: Peng Tao @ 2016-12-08 17:12 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: netdev, Jorgen Hansen, David Miller, kvm, virtualization

Currently, if a connect call fails on a signal or timeout (e.g., guest is still
in the process of starting up), we'll just return to caller and leave the connect
packet queued and they are sent even though the connection is considered a failure,
which can confuse applications with unwanted false connect attempt.

The patchset enables vsock (both host and guest) to cancel queued packets when
a connect attempt is considered to fail.

v3 changelog:
  - define cancel_pkt callback in struct vsock_transport rather than struct virtio_transport
  - rename virtio_vsock_pkt->vsk to virtio_vsock_pkt->cancel_token
v2 changelog:
  - fix queued_replies counting and resume tx/rx when necessary


Peng Tao (4):
  vsock: track pkt owner vsock
  vhost-vsock: add pkt cancel capability
  vsock: add pkt cancel capability
  vsock: cancel packets when failing to connect

 drivers/vhost/vsock.c                   | 41 ++++++++++++++++++++++++++++++++
 include/linux/virtio_vsock.h            |  2 ++
 include/net/af_vsock.h                  |  3 +++
 net/vmw_vsock/af_vsock.c                | 14 +++++++++++
 net/vmw_vsock/virtio_transport.c        | 42 +++++++++++++++++++++++++++++++++
 net/vmw_vsock/virtio_transport_common.c |  7 ++++++
 6 files changed, 109 insertions(+)

-- 
2.7.4

^ permalink raw reply

* [PATCH v3 1/4] vsock: track pkt owner vsock
From: Peng Tao @ 2016-12-08 17:12 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: netdev, Jorgen Hansen, David Miller, kvm, virtualization
In-Reply-To: <1481217156-7160-1-git-send-email-bergwolf@gmail.com>

So that we can cancel a queued pkt later if necessary.

Signed-off-by: Peng Tao <bergwolf@gmail.com>
---
 include/linux/virtio_vsock.h            | 2 ++
 net/vmw_vsock/virtio_transport_common.c | 7 +++++++
 2 files changed, 9 insertions(+)

diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 9638bfe..193ad3a 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -48,6 +48,7 @@ struct virtio_vsock_pkt {
 	struct virtio_vsock_hdr	hdr;
 	struct work_struct work;
 	struct list_head list;
+	void *cancel_token; /* only used for cancellation */
 	void *buf;
 	u32 len;
 	u32 off;
@@ -56,6 +57,7 @@ struct virtio_vsock_pkt {
 
 struct virtio_vsock_pkt_info {
 	u32 remote_cid, remote_port;
+	struct vsock_sock *vsk;
 	struct msghdr *msg;
 	u32 pkt_len;
 	u16 type;
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index a53b3a1..ef94eb8 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -57,6 +57,7 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
 	pkt->len		= len;
 	pkt->hdr.len		= cpu_to_le32(len);
 	pkt->reply		= info->reply;
+	pkt->cancel_token	= info->vsk;
 
 	if (info->msg && len > 0) {
 		pkt->buf = kmalloc(len, GFP_KERNEL);
@@ -180,6 +181,7 @@ static int virtio_transport_send_credit_update(struct vsock_sock *vsk,
 	struct virtio_vsock_pkt_info info = {
 		.op = VIRTIO_VSOCK_OP_CREDIT_UPDATE,
 		.type = type,
+		.vsk = vsk,
 	};
 
 	return virtio_transport_send_pkt_info(vsk, &info);
@@ -519,6 +521,7 @@ int virtio_transport_connect(struct vsock_sock *vsk)
 	struct virtio_vsock_pkt_info info = {
 		.op = VIRTIO_VSOCK_OP_REQUEST,
 		.type = VIRTIO_VSOCK_TYPE_STREAM,
+		.vsk = vsk,
 	};
 
 	return virtio_transport_send_pkt_info(vsk, &info);
@@ -534,6 +537,7 @@ int virtio_transport_shutdown(struct vsock_sock *vsk, int mode)
 			  VIRTIO_VSOCK_SHUTDOWN_RCV : 0) |
 			 (mode & SEND_SHUTDOWN ?
 			  VIRTIO_VSOCK_SHUTDOWN_SEND : 0),
+		.vsk = vsk,
 	};
 
 	return virtio_transport_send_pkt_info(vsk, &info);
@@ -560,6 +564,7 @@ virtio_transport_stream_enqueue(struct vsock_sock *vsk,
 		.type = VIRTIO_VSOCK_TYPE_STREAM,
 		.msg = msg,
 		.pkt_len = len,
+		.vsk = vsk,
 	};
 
 	return virtio_transport_send_pkt_info(vsk, &info);
@@ -581,6 +586,7 @@ static int virtio_transport_reset(struct vsock_sock *vsk,
 		.op = VIRTIO_VSOCK_OP_RST,
 		.type = VIRTIO_VSOCK_TYPE_STREAM,
 		.reply = !!pkt,
+		.vsk = vsk,
 	};
 
 	/* Send RST only if the original pkt is not a RST pkt */
@@ -826,6 +832,7 @@ virtio_transport_send_response(struct vsock_sock *vsk,
 		.remote_cid = le32_to_cpu(pkt->hdr.src_cid),
 		.remote_port = le32_to_cpu(pkt->hdr.src_port),
 		.reply = true,
+		.vsk = vsk,
 	};
 
 	return virtio_transport_send_pkt_info(vsk, &info);
-- 
2.7.4

^ permalink raw reply related

* [PATCH v3 2/4] vhost-vsock: add pkt cancel capability
From: Peng Tao @ 2016-12-08 17:12 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: netdev, Jorgen Hansen, David Miller, kvm, virtualization
In-Reply-To: <1481217156-7160-2-git-send-email-bergwolf@gmail.com>

To allow canceling all packets of a connection.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Peng Tao <bergwolf@gmail.com>
---
 drivers/vhost/vsock.c  | 41 +++++++++++++++++++++++++++++++++++++++++
 include/net/af_vsock.h |  3 +++
 2 files changed, 44 insertions(+)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index a504e2e0..db64d51 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -218,6 +218,46 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)
 	return len;
 }
 
+static int
+vhost_transport_cancel_pkt(struct vsock_sock *vsk)
+{
+	struct vhost_vsock *vsock;
+	struct virtio_vsock_pkt *pkt, *n;
+	int cnt = 0;
+	LIST_HEAD(freeme);
+
+	/* Find the vhost_vsock according to guest context id  */
+	vsock = vhost_vsock_get(vsk->remote_addr.svm_cid);
+	if (!vsock)
+		return -ENODEV;
+
+	spin_lock_bh(&vsock->send_pkt_list_lock);
+	list_for_each_entry_safe(pkt, n, &vsock->send_pkt_list, list) {
+		if (pkt->cancel_token != (void *)vsk)
+			continue;
+		list_move(&pkt->list, &freeme);
+	}
+	spin_unlock_bh(&vsock->send_pkt_list_lock);
+
+	list_for_each_entry_safe(pkt, n, &freeme, list) {
+		if (pkt->reply)
+			cnt++;
+		list_del(&pkt->list);
+		virtio_transport_free_pkt(pkt);
+	}
+
+	if (cnt) {
+		struct vhost_virtqueue *tx_vq = &vsock->vqs[VSOCK_VQ_TX];
+		int new_cnt;
+
+		new_cnt = atomic_sub_return(cnt, &vsock->queued_replies);
+		if (new_cnt + cnt >= tx_vq->num && new_cnt < tx_vq->num)
+			vhost_poll_queue(&tx_vq->poll);
+	}
+
+	return 0;
+}
+
 static struct virtio_vsock_pkt *
 vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
 		      unsigned int out, unsigned int in)
@@ -664,6 +704,7 @@ static struct virtio_transport vhost_transport = {
 		.release                  = virtio_transport_release,
 		.connect                  = virtio_transport_connect,
 		.shutdown                 = virtio_transport_shutdown,
+		.cancel_pkt               = vhost_transport_cancel_pkt,
 
 		.dgram_enqueue            = virtio_transport_dgram_enqueue,
 		.dgram_dequeue            = virtio_transport_dgram_dequeue,
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index f275896..ce5f100 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -100,6 +100,9 @@ struct vsock_transport {
 	void (*destruct)(struct vsock_sock *);
 	void (*release)(struct vsock_sock *);
 
+	/* Cancel packets belonging the same vsock */
+	int (*cancel_pkt)(struct vsock_sock *vsk);
+
 	/* Connections. */
 	int (*connect)(struct vsock_sock *);
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH v3 3/4] vsock: add pkt cancel capability
From: Peng Tao @ 2016-12-08 17:12 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: netdev, Jorgen Hansen, David Miller, kvm, virtualization
In-Reply-To: <1481217156-7160-3-git-send-email-bergwolf@gmail.com>

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Peng Tao <bergwolf@gmail.com>
---
 net/vmw_vsock/virtio_transport.c | 42 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 936d7ee..95c1162 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -170,6 +170,47 @@ virtio_transport_send_pkt(struct virtio_vsock_pkt *pkt)
 	return len;
 }
 
+static int
+virtio_transport_cancel_pkt(struct vsock_sock *vsk)
+{
+	struct virtio_vsock *vsock;
+	struct virtio_vsock_pkt *pkt, *n;
+	int cnt = 0;
+	LIST_HEAD(freeme);
+
+	vsock = virtio_vsock_get();
+	if (!vsock) {
+		return -ENODEV;
+	}
+
+	spin_lock_bh(&vsock->send_pkt_list_lock);
+	list_for_each_entry_safe(pkt, n, &vsock->send_pkt_list, list) {
+		if (pkt->cancel_token != (void *)vsk)
+			continue;
+		list_move(&pkt->list, &freeme);
+	}
+	spin_unlock_bh(&vsock->send_pkt_list_lock);
+
+	list_for_each_entry_safe(pkt, n, &freeme, list) {
+		if (pkt->reply)
+			cnt++;
+		list_del(&pkt->list);
+		virtio_transport_free_pkt(pkt);
+	}
+
+	if (cnt) {
+		struct virtqueue *rx_vq = vsock->vqs[VSOCK_VQ_RX];
+		int new_cnt;
+
+		new_cnt = atomic_sub_return(cnt, &vsock->queued_replies);
+		if (new_cnt + cnt >= virtqueue_get_vring_size(rx_vq) &&
+		    new_cnt < virtqueue_get_vring_size(rx_vq))
+			queue_work(virtio_vsock_workqueue, &vsock->rx_work);
+	}
+
+	return 0;
+}
+
 static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
 {
 	int buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
@@ -419,6 +460,7 @@ static struct virtio_transport virtio_transport = {
 		.release                  = virtio_transport_release,
 		.connect                  = virtio_transport_connect,
 		.shutdown                 = virtio_transport_shutdown,
+		.cancel_pkt               = virtio_transport_cancel_pkt,
 
 		.dgram_bind               = virtio_transport_dgram_bind,
 		.dgram_dequeue            = virtio_transport_dgram_dequeue,
-- 
2.7.4

^ permalink raw reply related

* [PATCH v3 4/4] vsock: cancel packets when failing to connect
From: Peng Tao @ 2016-12-08 17:12 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: netdev, Jorgen Hansen, David Miller, kvm, virtualization
In-Reply-To: <1481217156-7160-4-git-send-email-bergwolf@gmail.com>

Otherwise we'll leave the packets queued until releasing vsock device.
E.g., if guest is slow to start up, resulting ETIMEDOUT on connect, guest
will get the connect requests from failed host sockets.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Peng Tao <bergwolf@gmail.com>
---
 net/vmw_vsock/af_vsock.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 8a398b3..c73b03a 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1101,10 +1101,19 @@ static const struct proto_ops vsock_dgram_ops = {
 	.sendpage = sock_no_sendpage,
 };
 
+static int vsock_transport_cancel_pkt(struct vsock_sock *vsk)
+{
+	if (!transport->cancel_pkt)
+		return -EOPNOTSUPP;
+
+	return transport->cancel_pkt(vsk);
+}
+
 static void vsock_connect_timeout(struct work_struct *work)
 {
 	struct sock *sk;
 	struct vsock_sock *vsk;
+	int cancel = 0;
 
 	vsk = container_of(work, struct vsock_sock, dwork.work);
 	sk = sk_vsock(vsk);
@@ -1115,8 +1124,11 @@ static void vsock_connect_timeout(struct work_struct *work)
 		sk->sk_state = SS_UNCONNECTED;
 		sk->sk_err = ETIMEDOUT;
 		sk->sk_error_report(sk);
+		cancel = 1;
 	}
 	release_sock(sk);
+	if (cancel)
+		vsock_transport_cancel_pkt(vsk);
 
 	sock_put(sk);
 }
@@ -1223,11 +1235,13 @@ static int vsock_stream_connect(struct socket *sock, struct sockaddr *addr,
 			err = sock_intr_errno(timeout);
 			sk->sk_state = SS_UNCONNECTED;
 			sock->state = SS_UNCONNECTED;
+			vsock_transport_cancel_pkt(vsk);
 			goto out_wait;
 		} else if (timeout == 0) {
 			err = -ETIMEDOUT;
 			sk->sk_state = SS_UNCONNECTED;
 			sock->state = SS_UNCONNECTED;
+			vsock_transport_cancel_pkt(vsk);
 			goto out_wait;
 		}
 
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH-RESEND] vhost-vsock: fix orphan connection reset
From: David Miller @ 2016-12-09  2:25 UTC (permalink / raw)
  To: bergwolf; +Cc: netdev, kvm, stefanha, virtualization
In-Reply-To: <1481217046-7058-1-git-send-email-bergwolf@gmail.com>

From: Peng Tao <bergwolf@gmail.com>
Date: Fri,  9 Dec 2016 01:10:46 +0800

> local_addr.svm_cid is host cid. We should check guest cid instead,
> which is remote_addr.svm_cid. Otherwise we end up resetting all
> connections to all guests.
> 
> Cc: stable@vger.kernel.org [4.8+]
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Peng Tao <bergwolf@gmail.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* RE: [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
From: Li, Liang Z @ 2016-12-09  3:09 UTC (permalink / raw)
  To: Hansen, Dave, David Hildenbrand, kvm@vger.kernel.org
  Cc: virtio-dev@lists.oasis-open.org, mhocko@suse.com, mst@redhat.com,
	qemu-devel@nongnu.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, dgilbert@redhat.com, pbonzini@redhat.com,
	akpm@linux-foundation.org,
	virtualization@lists.linux-foundation.org,
	kirill.shutemov@linux.intel.com
In-Reply-To: <f7b47cc4-ee94-bacb-5a17-d049b402263e@intel.com>

> Subject: Re: [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating
> & fast live migration
> 
> On 12/07/2016 05:35 AM, Li, Liang Z wrote:
> >> Am 30.11.2016 um 09:43 schrieb Liang Li:
> >> IOW in real examples, do we have really large consecutive areas or
> >> are all pages just completely distributed over our memory?
> >
> > The buddy system of Linux kernel memory management shows there
> should
> > be quite a lot of consecutive pages as long as there are a portion of
> > free memory in the guest.
> ...
> > If all pages just completely distributed over our memory, it means the
> > memory fragmentation is very serious, the kernel has the mechanism to
> > avoid this happened.
> 
> While it is correct that the kernel has anti-fragmentation mechanisms, I don't
> think it invalidates the question as to whether a bitmap would be too sparse
> to be effective.
> 
> > In the other hand, the inflating should not happen at this time
> > because the guest is almost 'out of memory'.
> 
> I don't think this is correct.  Most systems try to run with relatively little free
> memory all the time, using the bulk of it as page cache.  We have no reason
> to expect that ballooning will only occur when there is lots of actual free
> memory and that it will not occur when that same memory is in use as page
> cache.
> 

Yes.
> In these patches, you're effectively still sending pfns.  You're just sending
> one pfn per high-order page which is giving a really nice speedup.  IMNHO,
> you're avoiding doing a real bitmap because creating a bitmap means either
> have a really big bitmap, or you would have to do some sorting (or multiple
> passes) of the free lists before populating a smaller bitmap.
> 
> Like David, I would still like to see some data on whether the choice between
> bitmaps and pfn lists is ever clearly in favor of bitmaps.  You haven't
> convinced me, at least, that the data isn't even worth collecting.

I will try to get some data with the real workload and share it with your guys.

Thanks!
Liang

^ permalink raw reply

* RE: [Qemu-devel] [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
From: Li, Liang Z @ 2016-12-09  4:45 UTC (permalink / raw)
  To: Andrea Arcangeli, Hansen, Dave
  Cc: mhocko@suse.com, kvm@vger.kernel.org, mst@redhat.com,
	qemu-devel@nongnu.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, kirill.shutemov@linux.intel.com,
	pbonzini@redhat.com, akpm@linux-foundation.org,
	virtualization@lists.linux-foundation.org, dgilbert@redhat.com
In-Reply-To: <20161207202824.GH28786@redhat.com>

> > 1. Current patches do a hypercall for each order in the allocator.
> >    This is inefficient, but independent from the underlying data
> >    structure in the ABI, unless bitmaps are in play, which they aren't.
> > 2. Should we have bitmaps in the ABI, even if they are not in use by the
> >    guest implementation today?  Andrea says they have zero benefits
> >    over a pfn/len scheme.  Dave doesn't think they have zero benefits
> >    but isn't that attached to them.  QEMU's handling gets more
> >    complicated when using a bitmap.
> > 3. Should the ABI contain records each with a pfn/len pair or a
> >    pfn/order pair?
> >    3a. 'len' is more flexible, but will always be a power-of-two anyway
> > 	for high-order pages (the common case)
> 
> Len wouldn't be a power of two practically only if we detect adjacent pages
> of smaller order that may merge into larger orders we already allocated (or
> the other way around).
> 
> [addr=2M, len=2M] allocated at order 9 pass [addr=4M, len=1M] allocated at
> order 8 pass -> merge as [addr=2M, len=3M]
> 
> Not sure if it would be worth it, but that unless we do this, page-order or len
> won't make much difference.
> 
> >    3b. if we decide not to have a bitmap, then we basically have plenty
> > 	of space for 'len' and should just do it
> >    3c. It's easiest for the hypervisor to turn pfn/len into the
> >        madvise() calls that it needs.
> >
> > Did I miss anything?
> 
> I think you summarized fine all my arguments in your summary.
> 
> > FWIW, I don't feel that strongly about the bitmap.  Li had one
> > originally, but I think the code thus far has demonstrated a huge
> > benefit without even having a bitmap.
> >
> > I've got no objections to ripping the bitmap out of the ABI.
> 
> I think we need to see a statistic showing the number of bits set in each
> bitmap in average, after some uptime and lru churn, like running stresstest
> app for a while with I/O and then inflate the balloon and
> count:
> 
> 1) how many bits were set vs total number of bits used in bitmaps
> 
> 2) how many times bitmaps were used vs bitmap_len = 0 case of single
>    page
> 
> My guess would be like very low percentage for both points.
> 

> So there is a connection with the MAX_ORDER..0 allocation loop and the ABI
> change, but I agree any of the ABI proposed would still allow for it this logic to
> be used. Bitmap or not bitmap, the loop would still work.

Hi guys,

What's the conclusion of your discussion? 
It seems you want some statistic before deciding whether to  ripping the bitmap from the ABI, am I right?

Thanks!
Liang 

^ permalink raw reply

* Re: [Qemu-devel] [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
From: Dave Hansen @ 2016-12-09  4:53 UTC (permalink / raw)
  To: Li, Liang Z, Andrea Arcangeli
  Cc: mhocko@suse.com, kvm@vger.kernel.org, mst@redhat.com,
	qemu-devel@nongnu.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, kirill.shutemov@linux.intel.com,
	pbonzini@redhat.com, akpm@linux-foundation.org,
	virtualization@lists.linux-foundation.org, dgilbert@redhat.com
In-Reply-To: <F2CBF3009FA73547804AE4C663CAB28E3A14E2AD@SHSMSX104.ccr.corp.intel.com>

On 12/08/2016 08:45 PM, Li, Liang Z wrote:
> What's the conclusion of your discussion? It seems you want some
> statistic before deciding whether to  ripping the bitmap from the
> ABI, am I right?

I think Andrea and David feel pretty strongly that we should remove the
bitmap, unless we have some data to support keeping it.  I don't feel as
strongly about it, but I think their critique of it is pretty valid.  I
think the consensus is that the bitmap needs to go.

The only real question IMNHO is whether we should do a power-of-2 or a
length.  But, if we have 12 bits, then the argument for doing length is
pretty strong.  We don't need anywhere near 12 bits if doing power-of-2.

^ permalink raw reply

* RE: [Qemu-devel] [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
From: Li, Liang Z @ 2016-12-09  5:35 UTC (permalink / raw)
  To: Hansen, Dave, Andrea Arcangeli
  Cc: mhocko@suse.com, kvm@vger.kernel.org, mst@redhat.com,
	qemu-devel@nongnu.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, kirill.shutemov@linux.intel.com,
	pbonzini@redhat.com, akpm@linux-foundation.org,
	virtualization@lists.linux-foundation.org, dgilbert@redhat.com
In-Reply-To: <060287c7-d1af-45d5-70ea-ad35d4bbeb84@intel.com>

> On 12/08/2016 08:45 PM, Li, Liang Z wrote:
> > What's the conclusion of your discussion? It seems you want some
> > statistic before deciding whether to  ripping the bitmap from the ABI,
> > am I right?
> 
> I think Andrea and David feel pretty strongly that we should remove the
> bitmap, unless we have some data to support keeping it.  I don't feel as
> strongly about it, but I think their critique of it is pretty valid.  I think the
> consensus is that the bitmap needs to go.
> 

Thanks for you clarification.

> The only real question IMNHO is whether we should do a power-of-2 or a
> length.  But, if we have 12 bits, then the argument for doing length is pretty
> strong.  We don't need anywhere near 12 bits if doing power-of-2.
> 
So each item can max represent 16MB Bytes, seems not big enough,
but enough for most case.
Things became much more simple without the bitmap, and I like simple solution too. :)

I will prepare the v6 and remove all the bitmap related stuffs. Thank you all!

Liang

^ permalink raw reply

* Re: [PATCH] linux/types.h: enable endian checks for all sparse builds
From: Madhani, Himanshu @ 2016-12-09  6:40 UTC (permalink / raw)
  To: Michael S. Tsirkin, Bart Van Assche
  Cc: kvm@vger.kernel.org, Neil Armstrong, David Airlie,
	linux-remoteproc@vger.kernel.org, dri-devel@lists.freedesktop.org,
	virtualization@lists.linux-foundation.org,
	linux-s390@vger.kernel.org, James E.J. Bottomley, Herbert Xu,
	linux-scsi@vger.kernel.org, Christoph Hellwig,
	v9fs-developer@lists.sourceforge.net, Asias He, Arnd Bergmann,
	linux-kbuild@vger.kernel.org, Jens Axboe, Michal Marek,
	Stefan Hajnoczi <stef>
In-Reply-To: <20161208163658-mutt-send-email-mst@kernel.org>

Hi Mike/Bart, 







On 12/8/16, 8:17 AM, "virtualization-bounces@lists.linux-foundation.org on behalf of Michael S. Tsirkin" <virtualization-bounces@lists.linux-foundation.org on behalf of mst@redhat.com> wrote:

>On Thu, Dec 08, 2016 at 06:38:11AM +0000, Bart Van Assche wrote:
>> On 12/07/16 21:54, Michael S. Tsirkin wrote:
>> > On Thu, Dec 08, 2016 at 05:21:47AM +0000, Bart Van Assche wrote:
>> >> Additionally, there are notable exceptions to the rule that most drivers
>> >> are endian-clean, e.g. drivers/scsi/qla2xxx. I would appreciate it if it
>> >> would remain possible to check such drivers with sparse without enabling
>> >> endianness checks. Have you considered to change #ifdef __CHECK_ENDIAN__
>> >> into e.g. #ifndef __DONT_CHECK_ENDIAN__?
>> >
>> > The right thing is probably just to fix these, isn't it?
>> > Until then, why not just ignore the warnings?
>> 
>> Neither option is realistic. With endian-checking enabled the qla2xxx 
>> driver triggers so many warnings that it becomes a real challenge to 
>> filter the non-endian warnings out manually:
>> 
>> $ for f in "" CF=-D__CHECK_ENDIAN__; do make M=drivers/scsi/qla2xxx C=2\
>>      $f | &grep -c ': warning:'; done
>> 4
>> 752
>
>You can always revert this patch in your tree, or whatever.  It does not
>look like this will get fixed otherwise.
>
>> If you think it would be easy to fix the endian warnings triggered by 
>> the qla2xxx driver, you are welcome to try to fix these.
>> 
>> Bart.
>
>Yea, this hardware was designed by someone who thought mixing
>LE and BE all over the place is a good idea.
>But who said it should be easy?
>
>Maybe this change will be enough to motivate the maintainers.
>
>Here's a minor buglet for you as a motivator:
>
>                        if (ct_rsp->header.response !=
>                            cpu_to_be16(CT_ACCEPT_RESPONSE)) {
>                                ql_dbg(ql_dbg_disc + ql_dbg_buffer, vha, 0x2077,
>                                    "%s failed rejected request on port_id: %02x%02x%02x Compeltion status 0x%x, response 0x%x\n",
>                                    routine, vha->d_id.b.domain,
>                                    vha->d_id.b.area, vha->d_id.b.al_pa, comp_status, ct_rsp->header.response);
>
>
>response is BE and isn't printed correctly.
>
>another:
>
>        eiter->a.max_frame_size = cpu_to_be32(eiter->a.max_frame_size);
>        size += 4 + 4;
>
>        ql_dbg(ql_dbg_disc, vha, 0x20bc,
>            "Max_Frame_Size = %x.\n", eiter->a.max_frame_size);
>
>printed too late, it's be by that time.
>
>Here's another suspicious line
>
>        ctio24->u.status1.flags = (atio->u.isp24.attr << 9) |
>            cpu_to_le16(CTIO7_FLAGS_STATUS_MODE_1 |
>                CTIO7_FLAGS_TERMINATE);
>
>shifting attr by 9 bits gives different results on BE and LE,
>mixing it with le16 looks rather strange.
>
>Another:
>
>                ha->flags.dport_enabled =
>                    (mid_init_cb->init_cb.firmware_options_1 & BIT_7) != 0;
>
>BIT_7 is native endian, firmware_options_1 is LE I think.
>
>
>
>Look at qla27xx_find_valid_image as well.
>
>        if (pri_image_status.signature != QLA27XX_IMG_STATUS_SIGN)
>
>qla27xx_image_status seems to be data coming from flash, but is
>somehow native-endian? Maybe ...
>
>
>        lun = a->u.isp24.fcp_cmnd.lun;
>
>I think lun here is in hardware format (le?), code treats it
>as native.
>
>
>Not to speak about interface abuse all over the place.
>How about this:
>
>uint32_t *
>qla24xx_read_flash_data(scsi_qla_host_t *vha, uint32_t *dwptr, uint32_t
>faddr,
>    uint32_t dwords)                     
>{
>        uint32_t i;                     
>        struct qla_hw_data *ha = vha->hw;
>                                        
>        /* Dword reads to flash. */
>        for (i = 0; i < dwords; i++, faddr++)
>                dwptr[i] = cpu_to_le32(qla24xx_read_flash_dword(ha,
>                    flash_data_addr(ha, faddr)));
>
>        return dwptr;                   
>}
>
>OK so we convert to LE ...
>
>                qla24xx_read_flash_data(vha, dcode, faddr, 4); 
>    
>                risc_addr = be32_to_cpu(dcode[2]);
>                *srisc_addr = *srisc_addr == 0 ? risc_addr : *srisc_addr;
>                risc_size = be32_to_cpu(dcode[3]);
>
>then happily assume it's BE.
>
>And again, coming from flash, it's unlikely to actually be in the native
>endian-ness as callers seem to assume. I'm guessing it's all BE.
>
>I poked at it a bit and was able to cut down # of warnings
>from 1700 to 1400 in an hour. Someone familiar with the code
>should look at it.

We’ll take a look and send patches to resolve these warnings. 

>
>-- 
>MST
>_______________________________________________
>Virtualization mailing list
>Virtualization@lists.linux-foundation.org
>https://lists.linuxfoundation.org/mailman/listinfo/virtualization
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH v3 2/4] vhost-vsock: add pkt cancel capability
From: Stefan Hajnoczi @ 2016-12-09 10:15 UTC (permalink / raw)
  To: Peng Tao
  Cc: kvm, netdev, virtualization, Stefan Hajnoczi, David Miller,
	Jorgen Hansen
In-Reply-To: <1481217156-7160-3-git-send-email-bergwolf@gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 1323 bytes --]

On Fri, Dec 09, 2016 at 01:12:34AM +0800, Peng Tao wrote:
> To allow canceling all packets of a connection.
> 
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Peng Tao <bergwolf@gmail.com>
> ---
>  drivers/vhost/vsock.c  | 41 +++++++++++++++++++++++++++++++++++++++++
>  include/net/af_vsock.h |  3 +++
>  2 files changed, 44 insertions(+)
> 
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index a504e2e0..db64d51 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -218,6 +218,46 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)
>  	return len;
>  }
>  
> +static int
> +vhost_transport_cancel_pkt(struct vsock_sock *vsk)
> +{
> +	struct vhost_vsock *vsock;
> +	struct virtio_vsock_pkt *pkt, *n;
> +	int cnt = 0;
> +	LIST_HEAD(freeme);
> +
> +	/* Find the vhost_vsock according to guest context id  */
> +	vsock = vhost_vsock_get(vsk->remote_addr.svm_cid);
> +	if (!vsock)
> +		return -ENODEV;
> +
> +	spin_lock_bh(&vsock->send_pkt_list_lock);
> +	list_for_each_entry_safe(pkt, n, &vsock->send_pkt_list, list) {
> +		if (pkt->cancel_token != (void *)vsk)

It's not necessary to cast to void* in C.  All pointers cast to void*
automatically without compiler warnings.  The warnings and explicit
casts are a C++ thing.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH v3 3/4] vsock: add pkt cancel capability
From: Stefan Hajnoczi @ 2016-12-09 10:16 UTC (permalink / raw)
  To: Peng Tao
  Cc: kvm, netdev, virtualization, Stefan Hajnoczi, David Miller,
	Jorgen Hansen
In-Reply-To: <1481217156-7160-4-git-send-email-bergwolf@gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 1076 bytes --]

On Fri, Dec 09, 2016 at 01:12:35AM +0800, Peng Tao wrote:
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Peng Tao <bergwolf@gmail.com>
> ---
>  net/vmw_vsock/virtio_transport.c | 42 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 42 insertions(+)
> 
> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> index 936d7ee..95c1162 100644
> --- a/net/vmw_vsock/virtio_transport.c
> +++ b/net/vmw_vsock/virtio_transport.c
> @@ -170,6 +170,47 @@ virtio_transport_send_pkt(struct virtio_vsock_pkt *pkt)
>  	return len;
>  }
>  
> +static int
> +virtio_transport_cancel_pkt(struct vsock_sock *vsk)
> +{
> +	struct virtio_vsock *vsock;
> +	struct virtio_vsock_pkt *pkt, *n;
> +	int cnt = 0;
> +	LIST_HEAD(freeme);
> +
> +	vsock = virtio_vsock_get();
> +	if (!vsock) {
> +		return -ENODEV;
> +	}
> +
> +	spin_lock_bh(&vsock->send_pkt_list_lock);
> +	list_for_each_entry_safe(pkt, n, &vsock->send_pkt_list, list) {
> +		if (pkt->cancel_token != (void *)vsk)

The cast is unnecessary here.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH v3 4/4] vsock: cancel packets when failing to connect
From: Stefan Hajnoczi @ 2016-12-09 10:17 UTC (permalink / raw)
  To: Peng Tao
  Cc: kvm, netdev, virtualization, Stefan Hajnoczi, David Miller,
	Jorgen Hansen
In-Reply-To: <1481217156-7160-5-git-send-email-bergwolf@gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 506 bytes --]

On Fri, Dec 09, 2016 at 01:12:36AM +0800, Peng Tao wrote:
> Otherwise we'll leave the packets queued until releasing vsock device.
> E.g., if guest is slow to start up, resulting ETIMEDOUT on connect, guest
> will get the connect requests from failed host sockets.
> 
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

Please do not include Reviewed-by: if the patch has undergone
substantial changes.

I am happy with this latest version:
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH v3 0/4] vsock: cancel connect packets when failing to connect
From: Stefan Hajnoczi @ 2016-12-09 10:18 UTC (permalink / raw)
  To: Peng Tao
  Cc: kvm, netdev, virtualization, Stefan Hajnoczi, David Miller,
	Jorgen Hansen
In-Reply-To: <1481217156-7160-1-git-send-email-bergwolf@gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 1557 bytes --]

On Fri, Dec 09, 2016 at 01:12:32AM +0800, Peng Tao wrote:
> Currently, if a connect call fails on a signal or timeout (e.g., guest is still
> in the process of starting up), we'll just return to caller and leave the connect
> packet queued and they are sent even though the connection is considered a failure,
> which can confuse applications with unwanted false connect attempt.
> 
> The patchset enables vsock (both host and guest) to cancel queued packets when
> a connect attempt is considered to fail.
> 
> v3 changelog:
>   - define cancel_pkt callback in struct vsock_transport rather than struct virtio_transport
>   - rename virtio_vsock_pkt->vsk to virtio_vsock_pkt->cancel_token
> v2 changelog:
>   - fix queued_replies counting and resume tx/rx when necessary
> 
> 
> Peng Tao (4):
>   vsock: track pkt owner vsock
>   vhost-vsock: add pkt cancel capability
>   vsock: add pkt cancel capability
>   vsock: cancel packets when failing to connect
> 
>  drivers/vhost/vsock.c                   | 41 ++++++++++++++++++++++++++++++++
>  include/linux/virtio_vsock.h            |  2 ++
>  include/net/af_vsock.h                  |  3 +++
>  net/vmw_vsock/af_vsock.c                | 14 +++++++++++
>  net/vmw_vsock/virtio_transport.c        | 42 +++++++++++++++++++++++++++++++++
>  net/vmw_vsock/virtio_transport_common.c |  7 ++++++
>  6 files changed, 109 insertions(+)

I'm happy although I pointed out two unnecessary (void*) casts.

Please wait for Jorgen to go happy on the af_vsock.c changes before
applying.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox