qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Eric Farman <farman@linux.ibm.com>
To: Halil Pasic <pasic@linux.ibm.com>, Cornelia Huck <cohuck@redhat.com>
Cc: Farhan Ali <alifm@linux.ibm.com>,
	linux-s390@vger.kernel.org,
	Alex Williamson <alex.williamson@redhat.com>,
	Pierre Morel <pmorel@linux.ibm.com>,
	kvm@vger.kernel.org, qemu-devel@nongnu.org,
	qemu-s390x@nongnu.org
Subject: Re: [Qemu-devel] [PATCH v4 1/6] vfio-ccw: make it safe to access channel programs
Date: Wed, 10 Apr 2019 22:59:41 -0400	[thread overview]
Message-ID: <4a692c11-edb2-93f9-7d8e-96a73a7ecdae@linux.ibm.com> (raw)
In-Reply-To: <20190410013434.7cea1971@oc2783563651>



On 4/9/19 7:34 PM, Halil Pasic wrote:
> On Mon, 8 Apr 2019 19:07:47 +0200
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
>> On Mon, 8 Apr 2019 13:02:12 -0400
>> Farhan Ali <alifm@linux.ibm.com> wrote:
>>
>>> On 03/01/2019 04:38 AM, Cornelia Huck wrote:
>>>> When we get a solicited interrupt, the start function may have
>>>> been cleared by a csch, but we still have a channel program
>>>> structure allocated. Make it safe to call the cp accessors in
>>>> any case, so we can call them unconditionally.
>>>>
>>>> While at it, also make sure that functions called from other parts
>>>> of the code return gracefully if the channel program structure
>>>> has not been initialized (even though that is a bug in the caller).
>>>>
>>>> Reviewed-by: Eric Farman<farman@linux.ibm.com>
>>>> Signed-off-by: Cornelia Huck<cohuck@redhat.com>
>>>> ---
>>>
>>> Hi Connie,
>>>
>>> My series of fixes for vfio-ccw depends on this patch as I would like to
>>> call cp_free unconditionally :) (I had developed my code on top of your
>>> patches).
>>>
>>> Could we pick this patch up as well when/if you pick up my patch series?
>>> I am in the process of sending out a v2.
>>>
>>> Regarding this patch we could merge it as a stand alone patch, separate
>>> from this series. And also the patch LGTM
>>>
>>> Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
>>
>> Actually, I wanted to ask how people felt about merging this whole
>> series for the next release :) It would be one thing less on my plate...
>>
> 
> Sorry I was not able to spend any significant amount of time on this
> lately.
> 
> Gave the combined set (this + Farhans fio-ccw fixes for kernel
> stacktraces v2) it a bit of smoke testing after some minor adjustments
> to make it compile:
> 
> --- a/drivers/s390/cio/vfio_ccw_ops.c
> +++ b/drivers/s390/cio/vfio_ccw_ops.c
> @@ -13,6 +13,7 @@
>   #include <linux/vfio.h>
>   #include <linux/mdev.h>
>   #include <linux/nospec.h>
> +#include <linux/slab.h>
>   
>   #include "vfio_ccw_private.h"
> 
> 

Hrm...  Taking today's master, and the two series you mention (slight 
adjustment to apply patch 3 of Connie's series, because part of it was 
split out a few weeks ago), and I don't encounter this.  Tried switching 
between SLUB/SLAB, but still compiles fine.

> I'm just running fio on a pass-through DASD and on some virto-blk disks
> in parallel. My QEMU is today's vfio-ccw-caps from your repo.
> 
> I see stuff like this:
> qemu-git: vfio-ccw: wirte I/O region failed with errno=16[1811/7332/0 iops] [eta 26m:34s]

Without knowing what the I/O was that failed, this is a guessing game. 
But I encountered something similar just now running fio.

qemu:
2019-04-11T02:06:09.524838Z qemu-system-s390x: vfio-ccw: wirte I/O 
region failed with errno=16

guest:
[  422.931458] dasd-eckd 0.0.ca8d: An error occurred in the DASD device 
driver, reason=14 00000000730bbe9a
[  553.741554] dasd-eckd 0.0.ca8e: An error occurred in the DASD device 
driver, reason=14 00000000e59b81da
[  554.761552] dasd-eckd 0.0.ca8d: An error occurred in the DASD device 
driver, reason=14 00000000cdf4fb4e
[  554.921518] dasd-eckd 0.0.ca8b: An error occurred in the DASD device 
driver, reason=14 0000000068775082
[  555.271556] dasd-eckd 0.0.ca8d: ERP 00000000cdf4fb4e has run out of 
retries and failed
[  555.271786] dasd(eckd): I/O status report for device 0.0.ca8d:
                dasd(eckd): in req: 00000000cdf4fb4e CC:00 FC:00 AC:00 
SC:00 DS:00 CS:00 RC:-16
                dasd(eckd): device 0.0.ca8d: Failing CCW:           (null)
                dasd(eckd): SORRY - NO VALID SENSE AVAILABLE
[  555.272214] dasd(eckd): Related CP in req: 00000000cdf4fb4e
                dasd(eckd): CCW 000000006434c30f: 03400000 00000000 DAT:
                dasd(eckd): CCW 000000007a65f7e0: 08000000 70E5B700 DAT:
[  555.272508] dasd(eckd):......


 From the associated I/O, I think this is fixed by a series I am nearly 
ready to send for review.  I'll try again with those fixes on top of the 
two series here, and report back.

> [Thread 0x3ff75890910 (LWP 43803) exited]/7932KB/0KB /s] [1930/7932/0 iops] [eta 26m:33s]
> [Thread 0x3ff6b7b7910 (LWP 43800) exited]/8030KB/0KB /s] [2031/8030/0 iops] [eta 26m:32s]
> dasd-eckd 0.0.1234: An error occurred in the DASD device driver, reason=14 00000000caa27abe
> INFO: task kworker/u6:1:26 blocked for more than 122 seconds.ps] [eta 23m:26s]eta 23m:25s]
>        Not tainted 5.1.0-rc3-00217-g6ab18dc #598
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/u6:1    D    0    26      2 0x00000000
> Workqueue: writeback wb_workfn (flush-94:0)
> Call Trace:
> ([<0000000000ae23f2>] __schedule+0x4fa/0xc98)
>   [<0000000000ae2bda>] schedule+0x4a/0xb0
>   [<00000000001b30ec>] io_schedule+0x2c/0x50
>   [<000000000071cc9c>] blk_mq_get_tag+0x1bc/0x310
>   [<000000000071571c>] blk_mq_get_request+0x1a4/0x4a8
>   [<0000000000719d38>] blk_mq_make_request+0x100/0x728
>   [<000000000070aa0a>] generic_make_request+0x26a/0x478
>   [<000000000070ac76>] submit_bio+0x5e/0x178
>   [<00000000004cfa2c>] ext4_io_submit+0x74/0x88
>   [<00000000004cfd32>] ext4_bio_write_page+0x2d2/0x4c8
>   [<00000000004aa5b4>] mpage_submit_page+0x74/0xa8
>   [<00000000004aa676>] mpage_process_page_bufs+0x8e/0x1b8
>   [<00000000004aa9bc>] mpage_prepare_extent_to_map+0x21c/0x390
>   [<00000000004b063c>] ext4_writepages+0x4bc/0x11a0
>   [<000000000032ef7a>] do_writepages+0x3a/0xf0
>   [<0000000000416226>] __writeback_single_inode+0x86/0x7a0
>   [<0000000000417154>] writeback_sb_inodes+0x2cc/0x550
>   [<0000000000417476>] __writeback_inodes_wb+0x9e/0xe8
>   [<00000000004179e0>] wb_writeback+0x468/0x598
>   [<0000000000418780>] wb_workfn+0x3b8/0x710
>   [<0000000000199322>] process_one_work+0x25a/0x668
>   [<000000000019977a>] worker_thread+0x4a/0x428
>   [<00000000001a1ae8>] kthread+0x150/0x170
>   [<0000000000aeadda>] kernel_thread_starter+0x6/0xc
>   [<0000000000aeadd4>] kernel_thread_starter+0x0/0xc
> 4 locks held by kworker/u6:1/26:
>   #0: 00000000792cf224 ((wq_completion)writeback){+.+.}, at: process_one_work+0x19c/0x668
>   #1: 000000009888c0e5 ((work_completion)(&(&wb->dwork)->work)){+.+.}, at: process_one_work+0x19c/0x668
>   #2: 000000002bfb76f0 (&type->s_umount_key#29){++++}, at: trylock_super+0x2e/0xa8
>   #3: 00000000ff47fe1d (&sbi->s_journal_flag_rwsem){.+.+}, at: do_writepages+0x3a/0xf0
> 
> 
> Since I haven't had the time to keep up lately, I will just trust Eric
> and Farhan on whether this should be merged or not. From a quick look at
> the code, and a quick stroll through my remaining memories, I think, there
> are a couple of things, that I myself would try to solve differently. But
> that is not a valid reason to hold this up.
> 
> I would like to spare the hustle of revisiting my old comments for everyone.
>  From the stability and utility perspective I'm pretty convinced we are
> better off than without the patches in question.

I agree, both series are an improvement.  I'll focus on both tomorrow.

  - Eric

> 
> TLDR:
> If it is good enough for Eric and Farhan, I have no objections against merging.
> 
> Regards,
> Halil
> 

WARNING: multiple messages have this Message-ID (diff)
From: Eric Farman <farman@linux.ibm.com>
To: Halil Pasic <pasic@linux.ibm.com>, Cornelia Huck <cohuck@redhat.com>
Cc: linux-s390@vger.kernel.org, Pierre Morel <pmorel@linux.ibm.com>,
	kvm@vger.kernel.org, qemu-s390x@nongnu.org,
	Farhan Ali <alifm@linux.ibm.com>,
	qemu-devel@nongnu.org,
	Alex Williamson <alex.williamson@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v4 1/6] vfio-ccw: make it safe to access channel programs
Date: Wed, 10 Apr 2019 22:59:41 -0400	[thread overview]
Message-ID: <4a692c11-edb2-93f9-7d8e-96a73a7ecdae@linux.ibm.com> (raw)
Message-ID: <20190411025941.n78OkHrRIccQAtB7gBAZWGhECVoiW9HbzE_JdUfo9rw@z> (raw)
In-Reply-To: <20190410013434.7cea1971@oc2783563651>



On 4/9/19 7:34 PM, Halil Pasic wrote:
> On Mon, 8 Apr 2019 19:07:47 +0200
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
>> On Mon, 8 Apr 2019 13:02:12 -0400
>> Farhan Ali <alifm@linux.ibm.com> wrote:
>>
>>> On 03/01/2019 04:38 AM, Cornelia Huck wrote:
>>>> When we get a solicited interrupt, the start function may have
>>>> been cleared by a csch, but we still have a channel program
>>>> structure allocated. Make it safe to call the cp accessors in
>>>> any case, so we can call them unconditionally.
>>>>
>>>> While at it, also make sure that functions called from other parts
>>>> of the code return gracefully if the channel program structure
>>>> has not been initialized (even though that is a bug in the caller).
>>>>
>>>> Reviewed-by: Eric Farman<farman@linux.ibm.com>
>>>> Signed-off-by: Cornelia Huck<cohuck@redhat.com>
>>>> ---
>>>
>>> Hi Connie,
>>>
>>> My series of fixes for vfio-ccw depends on this patch as I would like to
>>> call cp_free unconditionally :) (I had developed my code on top of your
>>> patches).
>>>
>>> Could we pick this patch up as well when/if you pick up my patch series?
>>> I am in the process of sending out a v2.
>>>
>>> Regarding this patch we could merge it as a stand alone patch, separate
>>> from this series. And also the patch LGTM
>>>
>>> Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
>>
>> Actually, I wanted to ask how people felt about merging this whole
>> series for the next release :) It would be one thing less on my plate...
>>
> 
> Sorry I was not able to spend any significant amount of time on this
> lately.
> 
> Gave the combined set (this + Farhans fio-ccw fixes for kernel
> stacktraces v2) it a bit of smoke testing after some minor adjustments
> to make it compile:
> 
> --- a/drivers/s390/cio/vfio_ccw_ops.c
> +++ b/drivers/s390/cio/vfio_ccw_ops.c
> @@ -13,6 +13,7 @@
>   #include <linux/vfio.h>
>   #include <linux/mdev.h>
>   #include <linux/nospec.h>
> +#include <linux/slab.h>
>   
>   #include "vfio_ccw_private.h"
> 
> 

Hrm...  Taking today's master, and the two series you mention (slight 
adjustment to apply patch 3 of Connie's series, because part of it was 
split out a few weeks ago), and I don't encounter this.  Tried switching 
between SLUB/SLAB, but still compiles fine.

> I'm just running fio on a pass-through DASD and on some virto-blk disks
> in parallel. My QEMU is today's vfio-ccw-caps from your repo.
> 
> I see stuff like this:
> qemu-git: vfio-ccw: wirte I/O region failed with errno=16[1811/7332/0 iops] [eta 26m:34s]

Without knowing what the I/O was that failed, this is a guessing game. 
But I encountered something similar just now running fio.

qemu:
2019-04-11T02:06:09.524838Z qemu-system-s390x: vfio-ccw: wirte I/O 
region failed with errno=16

guest:
[  422.931458] dasd-eckd 0.0.ca8d: An error occurred in the DASD device 
driver, reason=14 00000000730bbe9a
[  553.741554] dasd-eckd 0.0.ca8e: An error occurred in the DASD device 
driver, reason=14 00000000e59b81da
[  554.761552] dasd-eckd 0.0.ca8d: An error occurred in the DASD device 
driver, reason=14 00000000cdf4fb4e
[  554.921518] dasd-eckd 0.0.ca8b: An error occurred in the DASD device 
driver, reason=14 0000000068775082
[  555.271556] dasd-eckd 0.0.ca8d: ERP 00000000cdf4fb4e has run out of 
retries and failed
[  555.271786] dasd(eckd): I/O status report for device 0.0.ca8d:
                dasd(eckd): in req: 00000000cdf4fb4e CC:00 FC:00 AC:00 
SC:00 DS:00 CS:00 RC:-16
                dasd(eckd): device 0.0.ca8d: Failing CCW:           (null)
                dasd(eckd): SORRY - NO VALID SENSE AVAILABLE
[  555.272214] dasd(eckd): Related CP in req: 00000000cdf4fb4e
                dasd(eckd): CCW 000000006434c30f: 03400000 00000000 DAT:
                dasd(eckd): CCW 000000007a65f7e0: 08000000 70E5B700 DAT:
[  555.272508] dasd(eckd):......


 From the associated I/O, I think this is fixed by a series I am nearly 
ready to send for review.  I'll try again with those fixes on top of the 
two series here, and report back.

> [Thread 0x3ff75890910 (LWP 43803) exited]/7932KB/0KB /s] [1930/7932/0 iops] [eta 26m:33s]
> [Thread 0x3ff6b7b7910 (LWP 43800) exited]/8030KB/0KB /s] [2031/8030/0 iops] [eta 26m:32s]
> dasd-eckd 0.0.1234: An error occurred in the DASD device driver, reason=14 00000000caa27abe
> INFO: task kworker/u6:1:26 blocked for more than 122 seconds.ps] [eta 23m:26s]eta 23m:25s]
>        Not tainted 5.1.0-rc3-00217-g6ab18dc #598
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/u6:1    D    0    26      2 0x00000000
> Workqueue: writeback wb_workfn (flush-94:0)
> Call Trace:
> ([<0000000000ae23f2>] __schedule+0x4fa/0xc98)
>   [<0000000000ae2bda>] schedule+0x4a/0xb0
>   [<00000000001b30ec>] io_schedule+0x2c/0x50
>   [<000000000071cc9c>] blk_mq_get_tag+0x1bc/0x310
>   [<000000000071571c>] blk_mq_get_request+0x1a4/0x4a8
>   [<0000000000719d38>] blk_mq_make_request+0x100/0x728
>   [<000000000070aa0a>] generic_make_request+0x26a/0x478
>   [<000000000070ac76>] submit_bio+0x5e/0x178
>   [<00000000004cfa2c>] ext4_io_submit+0x74/0x88
>   [<00000000004cfd32>] ext4_bio_write_page+0x2d2/0x4c8
>   [<00000000004aa5b4>] mpage_submit_page+0x74/0xa8
>   [<00000000004aa676>] mpage_process_page_bufs+0x8e/0x1b8
>   [<00000000004aa9bc>] mpage_prepare_extent_to_map+0x21c/0x390
>   [<00000000004b063c>] ext4_writepages+0x4bc/0x11a0
>   [<000000000032ef7a>] do_writepages+0x3a/0xf0
>   [<0000000000416226>] __writeback_single_inode+0x86/0x7a0
>   [<0000000000417154>] writeback_sb_inodes+0x2cc/0x550
>   [<0000000000417476>] __writeback_inodes_wb+0x9e/0xe8
>   [<00000000004179e0>] wb_writeback+0x468/0x598
>   [<0000000000418780>] wb_workfn+0x3b8/0x710
>   [<0000000000199322>] process_one_work+0x25a/0x668
>   [<000000000019977a>] worker_thread+0x4a/0x428
>   [<00000000001a1ae8>] kthread+0x150/0x170
>   [<0000000000aeadda>] kernel_thread_starter+0x6/0xc
>   [<0000000000aeadd4>] kernel_thread_starter+0x0/0xc
> 4 locks held by kworker/u6:1/26:
>   #0: 00000000792cf224 ((wq_completion)writeback){+.+.}, at: process_one_work+0x19c/0x668
>   #1: 000000009888c0e5 ((work_completion)(&(&wb->dwork)->work)){+.+.}, at: process_one_work+0x19c/0x668
>   #2: 000000002bfb76f0 (&type->s_umount_key#29){++++}, at: trylock_super+0x2e/0xa8
>   #3: 00000000ff47fe1d (&sbi->s_journal_flag_rwsem){.+.+}, at: do_writepages+0x3a/0xf0
> 
> 
> Since I haven't had the time to keep up lately, I will just trust Eric
> and Farhan on whether this should be merged or not. From a quick look at
> the code, and a quick stroll through my remaining memories, I think, there
> are a couple of things, that I myself would try to solve differently. But
> that is not a valid reason to hold this up.
> 
> I would like to spare the hustle of revisiting my old comments for everyone.
>  From the stability and utility perspective I'm pretty convinced we are
> better off than without the patches in question.

I agree, both series are an improvement.  I'll focus on both tomorrow.

  - Eric

> 
> TLDR:
> If it is good enough for Eric and Farhan, I have no objections against merging.
> 
> Regards,
> Halil
> 



  parent reply	other threads:[~2019-04-11  3:07 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-01  9:38 [Qemu-devel] [PATCH v4 0/6] vfio-ccw: support hsch/csch (kernel part) Cornelia Huck
2019-03-01  9:38 ` [Qemu-devel] [PATCH v4 1/6] vfio-ccw: make it safe to access channel programs Cornelia Huck
2019-04-08 17:02   ` Farhan Ali
2019-04-08 17:02     ` Farhan Ali
2019-04-08 17:07     ` Cornelia Huck
2019-04-08 17:07       ` Cornelia Huck
2019-04-08 17:19       ` Farhan Ali
2019-04-08 17:19         ` Farhan Ali
2019-04-08 20:25       ` Eric Farman
2019-04-08 20:25         ` Eric Farman
2019-04-09 23:34       ` Halil Pasic
2019-04-09 23:34         ` Halil Pasic
2019-04-11  2:59         ` Eric Farman [this message]
2019-04-11  2:59           ` Eric Farman
2019-04-11 15:58           ` [Qemu-devel] [qemu-s390x] " Halil Pasic
2019-04-11 15:58             ` Halil Pasic
2019-04-11 16:25             ` Eric Farman
2019-04-11 16:25               ` Eric Farman
2019-04-11 16:36               ` Cornelia Huck
2019-04-11 16:36                 ` Cornelia Huck
2019-04-11 18:07                 ` Halil Pasic
2019-04-11 18:07                   ` Halil Pasic
2019-04-11 21:27           ` [Qemu-devel] " Eric Farman
2019-04-11 21:27             ` Eric Farman
2019-04-12  8:14             ` Cornelia Huck
2019-04-12  8:14               ` Cornelia Huck
2019-03-01  9:38 ` [Qemu-devel] [PATCH v4 2/6] vfio-ccw: rework ssch state handling Cornelia Huck
2019-03-08 22:18   ` Eric Farman
2019-03-11  9:47     ` Cornelia Huck
2019-03-01  9:38 ` [Qemu-devel] [PATCH v4 3/6] vfio-ccw: protect the I/O region Cornelia Huck
2019-03-01  9:39 ` [Qemu-devel] [PATCH v4 4/6] vfio-ccw: add capabilities chain Cornelia Huck
2019-04-15 14:40   ` Eric Farman
2019-04-15 14:40     ` Eric Farman
2019-04-15 15:24   ` Farhan Ali
2019-04-15 15:24     ` Farhan Ali
2019-03-01  9:39 ` [Qemu-devel] [PATCH v4 5/6] s390/cio: export hsch to modules Cornelia Huck
2019-03-01  9:39 ` [Qemu-devel] [PATCH v4 6/6] vfio-ccw: add handling for async channel instructions Cornelia Huck
2019-04-15 14:56   ` Eric Farman
2019-04-15 14:56     ` Eric Farman
2019-04-15 15:25   ` Farhan Ali
2019-04-15 15:25     ` Farhan Ali
2019-03-07 21:28 ` [Qemu-devel] [PATCH v4 0/6] vfio-ccw: support hsch/csch (kernel part) Eric Farman
2019-04-15 11:51 ` Cornelia Huck
2019-04-15 11:51   ` Cornelia Huck
2019-04-15 16:43 ` Cornelia Huck
2019-04-15 16:43   ` Cornelia Huck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4a692c11-edb2-93f9-7d8e-96a73a7ecdae@linux.ibm.com \
    --to=farman@linux.ibm.com \
    --cc=alex.williamson@redhat.com \
    --cc=alifm@linux.ibm.com \
    --cc=cohuck@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=pasic@linux.ibm.com \
    --cc=pmorel@linux.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-s390x@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).