[PATCH] scsi: avoid repetitive logging of device offline messages

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] scsi: avoid repetitive logging of device offline messages
@ 2020-03-09 18:14 Ewan D. Milne
  2020-03-09 19:05 ` James Bottomley
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Ewan D. Milne @ 2020-03-09 18:14 UTC (permalink / raw)
  To: linux-scsi

Large queues of I/O to offline devices that are eventually
submitted when devices are unblocked result in a many repeated
"rejecting I/O to offline device" messages.  These messages
can fill up the dmesg buffer in crash dumps so no useful
prior messages remain.  In addition, if a serial console
is used, the flood of messages can cause a hard lockup in
the console code.

Introduce a flag indicating the message has already been logged
for the device, and reset the flag when scsi_device_set_state()
changes the device state.

Signed-off-by: Ewan D. Milne <emilne@redhat.com>
---
 drivers/scsi/scsi_lib.c    | 8 ++++++--
 include/scsi/scsi_device.h | 2 ++
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 610ee41..d3a6d97 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1240,8 +1240,11 @@ scsi_prep_state_check(struct scsi_device *sdev, struct request *req)
 		 * commands.  The device must be brought online
 		 * before trying any recovery commands.
 		 */
-		sdev_printk(KERN_ERR, sdev,
-			    "rejecting I/O to offline device\n");
+		if (!sdev->offline_already) {
+			sdev->offline_already = 1;
+			sdev_printk(KERN_ERR, sdev,
+				    "rejecting I/O to offline device\n");
+		}
 		return BLK_STS_IOERR;
 	case SDEV_DEL:
 		/*
@@ -2340,6 +2343,7 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
 		break;
 
 	}
+	sdev->offline_already = 0;
 	sdev->sdev_state = state;
 	return 0;
 
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index f8312a3..72987a0 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -204,6 +204,8 @@ struct scsi_device {
 	unsigned unmap_limit_for_ws:1;	/* Use the UNMAP limit for WRITE SAME */
 	unsigned rpm_autosuspend:1;	/* Enable runtime autosuspend at device
 					 * creation time */
+	unsigned offline_already:1;	/* Device offline message logged */
+
 	atomic_t disk_events_disable_depth; /* disable depth for disk events */
 
 	DECLARE_BITMAP(supported_events, SDEV_EVT_MAXBITS); /* supported events */
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] scsi: avoid repetitive logging of device offline messages
  2020-03-09 18:14 [PATCH] scsi: avoid repetitive logging of device offline messages Ewan D. Milne
@ 2020-03-09 19:05 ` James Bottomley
  2020-03-09 20:49   ` Ewan D. Milne
  2020-03-09 19:36 ` Bart Van Assche
  2020-03-09 22:36 ` Laurence Oberman
  2 siblings, 1 reply; 12+ messages in thread
From: James Bottomley @ 2020-03-09 19:05 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi

On Mon, 2020-03-09 at 14:14 -0400, Ewan D. Milne wrote:
> Large queues of I/O to offline devices that are eventually
> submitted when devices are unblocked result in a many repeated
> "rejecting I/O to offline device" messages.  These messages
> can fill up the dmesg buffer in crash dumps so no useful
> prior messages remain.  In addition, if a serial console
> is used, the flood of messages can cause a hard lockup in
> the console code.
> 
> Introduce a flag indicating the message has already been logged
> for the device, and reset the flag when scsi_device_set_state()
> changes the device state.
> 
> Signed-off-by: Ewan D. Milne <emilne@redhat.com>
> ---
>  drivers/scsi/scsi_lib.c    | 8 ++++++--
>  include/scsi/scsi_device.h | 2 ++
>  2 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 610ee41..d3a6d97 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1240,8 +1240,11 @@ scsi_prep_state_check(struct scsi_device
> *sdev, struct request *req)
>  		 * commands.  The device must be brought online
>  		 * before trying any recovery commands.
>  		 */
> -		sdev_printk(KERN_ERR, sdev,
> -			    "rejecting I/O to offline device\n");
> +		if (!sdev->offline_already) {
> +			sdev->offline_already = 1;
> +			sdev_printk(KERN_ERR, sdev,
> +				    "rejecting I/O to offline
> device\n");
> +		}

Offline->online is a legal transition, so you'll need to clear this
flag when that happens so we get another offline message if it goes
offline again.

James


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] scsi: avoid repetitive logging of device offline messages
  2020-03-09 19:05 ` James Bottomley
@ 2020-03-09 20:49   ` Ewan D. Milne
  2020-03-10  3:58     ` Douglas Gilbert
  0 siblings, 1 reply; 12+ messages in thread
From: Ewan D. Milne @ 2020-03-09 20:49 UTC (permalink / raw)
  To: James Bottomley, linux-scsi

On Mon, 2020-03-09 at 12:05 -0700, James Bottomley wrote:
> On Mon, 2020-03-09 at 14:14 -0400, Ewan D. Milne wrote:
> > Large queues of I/O to offline devices that are eventually
> > submitted when devices are unblocked result in a many repeated
> > "rejecting I/O to offline device" messages.  These messages
> > can fill up the dmesg buffer in crash dumps so no useful
> > prior messages remain.  In addition, if a serial console
> > is used, the flood of messages can cause a hard lockup in
> > the console code.
> > 
> > Introduce a flag indicating the message has already been logged
> > for the device, and reset the flag when scsi_device_set_state()
> > changes the device state.
> > 
> > Signed-off-by: Ewan D. Milne <emilne@redhat.com>
> > ---
> >  drivers/scsi/scsi_lib.c    | 8 ++++++--
> >  include/scsi/scsi_device.h | 2 ++
> >  2 files changed, 8 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index 610ee41..d3a6d97 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -1240,8 +1240,11 @@ scsi_prep_state_check(struct scsi_device
> > *sdev, struct request *req)
> >  		 * commands.  The device must be brought online
> >  		 * before trying any recovery commands.
> >  		 */
> > -		sdev_printk(KERN_ERR, sdev,
> > -			    "rejecting I/O to offline device\n");
> > +		if (!sdev->offline_already) {
> > +			sdev->offline_already = 1;
> > +			sdev_printk(KERN_ERR, sdev,
> > +				    "rejecting I/O to offline
> > device\n");
> > +		}
> 
> Offline->online is a legal transition, so you'll need to clear this
> flag when that happens so we get another offline message if it goes
> offline again.
> 
> James
> 

That's what resetting the flag in scsi_device_set_state() does.
The only thing that worries me is if some driver manipulates the
sdev->sdev_state value directly.

-Ewan


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] scsi: avoid repetitive logging of device offline messages
  2020-03-09 20:49   ` Ewan D. Milne
@ 2020-03-10  3:58     ` Douglas Gilbert
  0 siblings, 0 replies; 12+ messages in thread
From: Douglas Gilbert @ 2020-03-10  3:58 UTC (permalink / raw)
  To: Ewan D. Milne, James Bottomley, linux-scsi

On 2020-03-09 4:49 p.m., Ewan D. Milne wrote:
> On Mon, 2020-03-09 at 12:05 -0700, James Bottomley wrote:
>> On Mon, 2020-03-09 at 14:14 -0400, Ewan D. Milne wrote:
>>> Large queues of I/O to offline devices that are eventually
>>> submitted when devices are unblocked result in a many repeated
>>> "rejecting I/O to offline device" messages.  These messages
>>> can fill up the dmesg buffer in crash dumps so no useful
>>> prior messages remain.  In addition, if a serial console
>>> is used, the flood of messages can cause a hard lockup in
>>> the console code.
>>>
>>> Introduce a flag indicating the message has already been logged
>>> for the device, and reset the flag when scsi_device_set_state()
>>> changes the device state.
>>>
>>> Signed-off-by: Ewan D. Milne <emilne@redhat.com>
>>> ---
>>>   drivers/scsi/scsi_lib.c    | 8 ++++++--
>>>   include/scsi/scsi_device.h | 2 ++
>>>   2 files changed, 8 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>>> index 610ee41..d3a6d97 100644
>>> --- a/drivers/scsi/scsi_lib.c
>>> +++ b/drivers/scsi/scsi_lib.c
>>> @@ -1240,8 +1240,11 @@ scsi_prep_state_check(struct scsi_device
>>> *sdev, struct request *req)
>>>   		 * commands.  The device must be brought online
>>>   		 * before trying any recovery commands.
>>>   		 */
>>> -		sdev_printk(KERN_ERR, sdev,
>>> -			    "rejecting I/O to offline device\n");
>>> +		if (!sdev->offline_already) {
>>> +			sdev->offline_already = 1;
>>> +			sdev_printk(KERN_ERR, sdev,
>>> +				    "rejecting I/O to offline
>>> device\n");
>>> +		}
>>
>> Offline->online is a legal transition, so you'll need to clear this
>> flag when that happens so we get another offline message if it goes
>> offline again.
>>
>> James
>>
> 
> That's what resetting the flag in scsi_device_set_state() does.
> The only thing that worries me is if some driver manipulates the
> sdev->sdev_state value directly.

Perhaps they could sneak in some c++isms and allow the definition
of struct fields to be preceded by [[protected]]. The idea would be
to complain loudly if that field is directly accessed. The accessor
functions might need something similar (or the same decoration) to
defeat that noise.

In the meantime:

Reviewed-by: Douglas Gilbert <dgilbert@interlog.com>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] scsi: avoid repetitive logging of device offline messages
  2020-03-09 18:14 [PATCH] scsi: avoid repetitive logging of device offline messages Ewan D. Milne
  2020-03-09 19:05 ` James Bottomley
@ 2020-03-09 19:36 ` Bart Van Assche
  2020-03-09 20:54   ` Ewan D. Milne
  2020-03-09 22:36 ` Laurence Oberman
  2 siblings, 1 reply; 12+ messages in thread
From: Bart Van Assche @ 2020-03-09 19:36 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi

On 3/9/20 11:14 AM, Ewan D. Milne wrote:
> Large queues of I/O to offline devices that are eventually
> submitted when devices are unblocked result in a many repeated
> "rejecting I/O to offline device" messages.  These messages
> can fill up the dmesg buffer in crash dumps so no useful
> prior messages remain.  In addition, if a serial console
> is used, the flood of messages can cause a hard lockup in
> the console code.
> 
> Introduce a flag indicating the message has already been logged
> for the device, and reset the flag when scsi_device_set_state()
> changes the device state.
> 
> Signed-off-by: Ewan D. Milne <emilne@redhat.com>
> ---
>   drivers/scsi/scsi_lib.c    | 8 ++++++--
>   include/scsi/scsi_device.h | 2 ++
>   2 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 610ee41..d3a6d97 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1240,8 +1240,11 @@ scsi_prep_state_check(struct scsi_device *sdev, struct request *req)
>   		 * commands.  The device must be brought online
>   		 * before trying any recovery commands.
>   		 */
> -		sdev_printk(KERN_ERR, sdev,
> -			    "rejecting I/O to offline device\n");
> +		if (!sdev->offline_already) {
> +			sdev->offline_already = 1;
> +			sdev_printk(KERN_ERR, sdev,
> +				    "rejecting I/O to offline device\n");
> +		}
>   		return BLK_STS_IOERR;
>   	case SDEV_DEL:
>   		/*
> @@ -2340,6 +2343,7 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
>   		break;
>   
>   	}
> +	sdev->offline_already = 0;
>   	sdev->sdev_state = state;
>   	return 0;
>   
> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
> index f8312a3..72987a0 100644
> --- a/include/scsi/scsi_device.h
> +++ b/include/scsi/scsi_device.h
> @@ -204,6 +204,8 @@ struct scsi_device {
>   	unsigned unmap_limit_for_ws:1;	/* Use the UNMAP limit for WRITE SAME */
>   	unsigned rpm_autosuspend:1;	/* Enable runtime autosuspend at device
>   					 * creation time */
> +	unsigned offline_already:1;	/* Device offline message logged */
> +
>   	atomic_t disk_events_disable_depth; /* disable depth for disk events */
>   
>   	DECLARE_BITMAP(supported_events, SDEV_EVT_MAXBITS); /* supported events */

Bitfields are troublesome in multithreaded software. Has it been 
considered to use rate-limiting instead of introducing a new bitfield 
member?

Thanks,

Bart.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] scsi: avoid repetitive logging of device offline messages
  2020-03-09 19:36 ` Bart Van Assche
@ 2020-03-09 20:54   ` Ewan D. Milne
  2020-03-10  2:02     ` Bart Van Assche
  0 siblings, 1 reply; 12+ messages in thread
From: Ewan D. Milne @ 2020-03-09 20:54 UTC (permalink / raw)
  To: Bart Van Assche, linux-scsi

On Mon, 2020-03-09 at 12:36 -0700, Bart Van Assche wrote:
> On 3/9/20 11:14 AM, Ewan D. Milne wrote:
> > Large queues of I/O to offline devices that are eventually
> > submitted when devices are unblocked result in a many repeated
> > "rejecting I/O to offline device" messages.  These messages
> > can fill up the dmesg buffer in crash dumps so no useful
> > prior messages remain.  In addition, if a serial console
> > is used, the flood of messages can cause a hard lockup in
> > the console code.
> > 
> > Introduce a flag indicating the message has already been logged
> > for the device, and reset the flag when scsi_device_set_state()
> > changes the device state.
> > 
> > Signed-off-by: Ewan D. Milne <emilne@redhat.com>
> > ---
> >   drivers/scsi/scsi_lib.c    | 8 ++++++--
> >   include/scsi/scsi_device.h | 2 ++
> >   2 files changed, 8 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index 610ee41..d3a6d97 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -1240,8 +1240,11 @@ scsi_prep_state_check(struct scsi_device *sdev, struct request *req)
> >   		 * commands.  The device must be brought online
> >   		 * before trying any recovery commands.
> >   		 */
> > -		sdev_printk(KERN_ERR, sdev,
> > -			    "rejecting I/O to offline device\n");
> > +		if (!sdev->offline_already) {
> > +			sdev->offline_already = 1;
> > +			sdev_printk(KERN_ERR, sdev,
> > +				    "rejecting I/O to offline device\n");
> > +		}
> >   		return BLK_STS_IOERR;
> >   	case SDEV_DEL:
> >   		/*
> > @@ -2340,6 +2343,7 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
> >   		break;
> >   
> >   	}
> > +	sdev->offline_already = 0;
> >   	sdev->sdev_state = state;
> >   	return 0;
> >   
> > diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
> > index f8312a3..72987a0 100644
> > --- a/include/scsi/scsi_device.h
> > +++ b/include/scsi/scsi_device.h
> > @@ -204,6 +204,8 @@ struct scsi_device {
> >   	unsigned unmap_limit_for_ws:1;	/* Use the UNMAP limit for WRITE SAME */
> >   	unsigned rpm_autosuspend:1;	/* Enable runtime autosuspend at device
> >   					 * creation time */
> > +	unsigned offline_already:1;	/* Device offline message logged */
> > +
> >   	atomic_t disk_events_disable_depth; /* disable depth for disk events */
> >   
> >   	DECLARE_BITMAP(supported_events, SDEV_EVT_MAXBITS); /* supported events */
> 
> Bitfields are troublesome in multithreaded software. Has it been 
> considered to use rate-limiting instead of introducing a new bitfield 
> member?
> 
> Thanks,
> 
> Bart.
> 

I did but printk_ratelimited() does not do what is desired here.
What we want is only a single message per-device.  If we ratelimit
the message instance itself we lose information in the log about which
devices were affected (which makes debugging issues with multipath I/O
much harder).

The only purpose of the flag is to try to suppress duplicate messages,
in the common case it is a single thread submitting the queued I/O which
is going to get rejected.  If multiple threads submit I/O there might
be duplicated messages but that is not all that critical.  Hence the
lack of locking on the flag.

-Ewan


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] scsi: avoid repetitive logging of device offline messages
  2020-03-09 20:54   ` Ewan D. Milne
@ 2020-03-10  2:02     ` Bart Van Assche
  2020-03-10 16:44       ` Ewan D. Milne
  0 siblings, 1 reply; 12+ messages in thread
From: Bart Van Assche @ 2020-03-10  2:02 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi

On 2020-03-09 13:54, Ewan D. Milne wrote:
> The only purpose of the flag is to try to suppress duplicate messages,
> in the common case it is a single thread submitting the queued I/O which
> is going to get rejected.  If multiple threads submit I/O there might
> be duplicated messages but that is not all that critical.  Hence the
> lack of locking on the flag.

Hi Ewan,

My concern is that scsi_prep_state_check() may be called from more than
one thread at the same time and that bitfield changes are not thread-safe.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] scsi: avoid repetitive logging of device offline messages
  2020-03-10  2:02     ` Bart Van Assche
@ 2020-03-10 16:44       ` Ewan D. Milne
  2020-03-11  4:01         ` Bart Van Assche
  0 siblings, 1 reply; 12+ messages in thread
From: Ewan D. Milne @ 2020-03-10 16:44 UTC (permalink / raw)
  To: Bart Van Assche, linux-scsi

On Mon, 2020-03-09 at 19:02 -0700, Bart Van Assche wrote:
> On 2020-03-09 13:54, Ewan D. Milne wrote:
> > The only purpose of the flag is to try to suppress duplicate messages,
> > in the common case it is a single thread submitting the queued I/O which
> > is going to get rejected.  If multiple threads submit I/O there might
> > be duplicated messages but that is not all that critical.  Hence the
> > lack of locking on the flag.
> 
> Hi Ewan,
> 
> My concern is that scsi_prep_state_check() may be called from more than
> one thread at the same time and that bitfield changes are not thread-safe.
> 
> Thanks,
> 
> Bart.

Yes, I agree that the flag is not thread-safe, but I don't think that
is a concern.  Because if we get multiple rejecting I/O messages until
the other threads see the flag change we are no worse off than before,
and once the sdev state changes back to SDEV_RUNNING we won't call
scsi_prep_state_check() and examine the flag.

-Ewan


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] scsi: avoid repetitive logging of device offline messages
  2020-03-10 16:44       ` Ewan D. Milne
@ 2020-03-11  4:01         ` Bart Van Assche
  2020-03-11  5:12           ` Douglas Gilbert
  2020-03-11 14:38           ` Ewan D. Milne
  0 siblings, 2 replies; 12+ messages in thread
From: Bart Van Assche @ 2020-03-11  4:01 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi

On 2020-03-10 09:44, Ewan D. Milne wrote:
> On Mon, 2020-03-09 at 19:02 -0700, Bart Van Assche wrote:
>> On 2020-03-09 13:54, Ewan D. Milne wrote:
>>> The only purpose of the flag is to try to suppress duplicate messages,
>>> in the common case it is a single thread submitting the queued I/O which
>>> is going to get rejected.  If multiple threads submit I/O there might
>>> be duplicated messages but that is not all that critical.  Hence the
>>> lack of locking on the flag.
>>
>> My concern is that scsi_prep_state_check() may be called from more than
>> one thread at the same time and that bitfield changes are not thread-safe.
> 
> Yes, I agree that the flag is not thread-safe, but I don't think that
> is a concern.  Because if we get multiple rejecting I/O messages until
> the other threads see the flag change we are no worse off than before,
> and once the sdev state changes back to SDEV_RUNNING we won't call
> scsi_prep_state_check() and examine the flag.

Hi Ewan,

This is the entire list of bitfields in struct scsi_device:

	unsigned removable:1;
	unsigned changed:1;
	unsigned busy:1;
	unsigned lockable:1;
	unsigned locked:1;
	unsigned borken:1;
	unsigned disconnect:1;
	unsigned soft_reset:1;
	unsigned sdtr:1;
	unsigned wdtr:1;
	unsigned ppr:1;
	unsigned tagged_supported:1;
	unsigned simple_tags:1;
	unsigned was_reset:1;
	unsigned expecting_cc_ua:1;
	unsigned use_10_for_rw:1;
	unsigned use_10_for_ms:1;
	unsigned set_dbd_for_ms:1;
	unsigned no_report_opcodes:1;
	unsigned no_write_same:1;
	unsigned use_16_for_rw:1;
	unsigned skip_ms_page_8:1;
	unsigned skip_ms_page_3f:1;
	unsigned skip_vpd_pages:1;
	unsigned try_vpd_pages:1;
	unsigned use_192_bytes_for_3f:1;
	unsigned no_start_on_add:1;
	unsigned allow_restart:1;
	unsigned manage_start_stop:1;
	unsigned start_stop_pwr_cond:1;
	unsigned no_uld_attach:1;
	unsigned select_no_atn:1;
	unsigned fix_capacity:1;
	unsigned guess_capacity:1;
	unsigned retry_hwerror:1;
	unsigned last_sector_bug:1;
	unsigned no_read_disc_info:1;
	unsigned no_read_capacity_16:1;
	unsigned try_rc_10_first:1;
	unsigned security_supported:1;
	unsigned is_visible:1;
	unsigned wce_default_on:1;
	unsigned no_dif:1;
	unsigned broken_fua:1;
	unsigned lun_in_cdb:1;
	unsigned unmap_limit_for_ws:1;
	unsigned rpm_autosuspend:1;

If any thread modifies any of these existing bitfields concurrently with
scsi_prep_state_check(), one of the two modifications will be lost. That
is because compilers implement bitfield changes as follows:

new_value = (old_value & ~(1 << ...)) | (1 << ...);

If two such assignment statements are executed concurrently, both start
from the same 'old_value' and only one of the two changes will happen.
I'm concerned that this patch introduces a maintenance problem in the
long term. Has it been considered to make 'offline_already' a 32-bits
integer variable or a boolean instead of a bitfield? I think such
variables can be modified without discarding values written from another
thread.

Thanks,

Bart.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] scsi: avoid repetitive logging of device offline messages
  2020-03-11  4:01         ` Bart Van Assche
@ 2020-03-11  5:12           ` Douglas Gilbert
  2020-03-11 14:38           ` Ewan D. Milne
  1 sibling, 0 replies; 12+ messages in thread
From: Douglas Gilbert @ 2020-03-11  5:12 UTC (permalink / raw)
  To: Bart Van Assche, Ewan D. Milne, linux-scsi

On 2020-03-11 12:01 a.m., Bart Van Assche wrote:
> On 2020-03-10 09:44, Ewan D. Milne wrote:
>> On Mon, 2020-03-09 at 19:02 -0700, Bart Van Assche wrote:
>>> On 2020-03-09 13:54, Ewan D. Milne wrote:
>>>> The only purpose of the flag is to try to suppress duplicate messages,
>>>> in the common case it is a single thread submitting the queued I/O which
>>>> is going to get rejected.  If multiple threads submit I/O there might
>>>> be duplicated messages but that is not all that critical.  Hence the
>>>> lack of locking on the flag.
>>>
>>> My concern is that scsi_prep_state_check() may be called from more than
>>> one thread at the same time and that bitfield changes are not thread-safe.
>>
>> Yes, I agree that the flag is not thread-safe, but I don't think that
>> is a concern.  Because if we get multiple rejecting I/O messages until
>> the other threads see the flag change we are no worse off than before,
>> and once the sdev state changes back to SDEV_RUNNING we won't call
>> scsi_prep_state_check() and examine the flag.
> 
> Hi Ewan,
> 
> This is the entire list of bitfields in struct scsi_device:
> 
> 	unsigned removable:1;
> 	unsigned changed:1;
> 	unsigned busy:1;
> 	unsigned lockable:1;
> 	unsigned locked:1;
> 	unsigned borken:1;
> 	unsigned disconnect:1;
> 	unsigned soft_reset:1;
> 	unsigned sdtr:1;
> 	unsigned wdtr:1;
> 	unsigned ppr:1;
> 	unsigned tagged_supported:1;
> 	unsigned simple_tags:1;
> 	unsigned was_reset:1;
> 	unsigned expecting_cc_ua:1;
> 	unsigned use_10_for_rw:1;
> 	unsigned use_10_for_ms:1;
> 	unsigned set_dbd_for_ms:1;
> 	unsigned no_report_opcodes:1;
> 	unsigned no_write_same:1;
> 	unsigned use_16_for_rw:1;
> 	unsigned skip_ms_page_8:1;
> 	unsigned skip_ms_page_3f:1;
> 	unsigned skip_vpd_pages:1;
> 	unsigned try_vpd_pages:1;
> 	unsigned use_192_bytes_for_3f:1;
> 	unsigned no_start_on_add:1;
> 	unsigned allow_restart:1;
> 	unsigned manage_start_stop:1;
> 	unsigned start_stop_pwr_cond:1;
> 	unsigned no_uld_attach:1;
> 	unsigned select_no_atn:1;
> 	unsigned fix_capacity:1;
> 	unsigned guess_capacity:1;
> 	unsigned retry_hwerror:1;
> 	unsigned last_sector_bug:1;
> 	unsigned no_read_disc_info:1;
> 	unsigned no_read_capacity_16:1;
> 	unsigned try_rc_10_first:1;
> 	unsigned security_supported:1;
> 	unsigned is_visible:1;
> 	unsigned wce_default_on:1;
> 	unsigned no_dif:1;
> 	unsigned broken_fua:1;
> 	unsigned lun_in_cdb:1;
> 	unsigned unmap_limit_for_ws:1;
> 	unsigned rpm_autosuspend:1;
> 
> If any thread modifies any of these existing bitfields concurrently with
> scsi_prep_state_check(), one of the two modifications will be lost. That
> is because compilers implement bitfield changes as follows:
> 
> new_value = (old_value & ~(1 << ...)) | (1 << ...);
> 
> If two such assignment statements are executed concurrently, both start
> from the same 'old_value' and only one of the two changes will happen.
> I'm concerned that this patch introduces a maintenance problem in the
> long term. Has it been considered to make 'offline_already' a 32-bits
> integer variable or a boolean instead of a bitfield? I think such
> variables can be modified without discarding values written from another
> thread.

Most of the stuff there is slow moving data, typically set once at device
creation time. The design predates atomic bitops. I can't see why the
addition of an extra bitfield, whose overwriting is not going to cause
a calamity ***, warrants the proposer having the rewrite this patch.

Doug Gilbert


*** setting offline_already effectively invalidates a lot of other bitfields.
     Still, if the offline_already fails to be set, due to a clash
     (with what?), then at worst the warning message is repeated and the
     patch code tries again to set that bitfield.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] scsi: avoid repetitive logging of device offline messages
  2020-03-11  4:01         ` Bart Van Assche
  2020-03-11  5:12           ` Douglas Gilbert
@ 2020-03-11 14:38           ` Ewan D. Milne
  1 sibling, 0 replies; 12+ messages in thread
From: Ewan D. Milne @ 2020-03-11 14:38 UTC (permalink / raw)
  To: Bart Van Assche, linux-scsi

On Tue, 2020-03-10 at 21:01 -0700, Bart Van Assche wrote:
> 

...

> If any thread modifies any of these existing bitfields concurrently with
> scsi_prep_state_check(), one of the two modifications will be lost. That
> is because compilers implement bitfield changes as follows:
> 
> new_value = (old_value & ~(1 << ...)) | (1 << ...);
> 
> If two such assignment statements are executed concurrently, both start
> from the same 'old_value' and only one of the two changes will happen.
> I'm concerned that this patch introduces a maintenance problem in the
> long term. Has it been considered to make 'offline_already' a 32-bits
> integer variable or a boolean instead of a bitfield? I think such
> variables can be modified without discarding values written from another
> thread.
> 
> Thanks,
> 
> Bart.
> 

I understand your concern about long-term maintenance, and introducing
a pattern that could be used by other code that needs more accurate
state.  (Although, it is likely that a lock or an atomic operation might
be required in other cases).  I'll post a v2 changing this to a boolean.

-Ewan



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] scsi: avoid repetitive logging of device offline messages
  2020-03-09 18:14 [PATCH] scsi: avoid repetitive logging of device offline messages Ewan D. Milne
  2020-03-09 19:05 ` James Bottomley
  2020-03-09 19:36 ` Bart Van Assche
@ 2020-03-09 22:36 ` Laurence Oberman
  2 siblings, 0 replies; 12+ messages in thread
From: Laurence Oberman @ 2020-03-09 22:36 UTC (permalink / raw)
  To: Ewan D. Milne, linux-scsi

On Mon, 2020-03-09 at 14:14 -0400, Ewan D. Milne wrote:
> Large queues of I/O to offline devices that are eventually
> submitted when devices are unblocked result in a many repeated
> "rejecting I/O to offline device" messages.  These messages
> can fill up the dmesg buffer in crash dumps so no useful
> prior messages remain.  In addition, if a serial console
> is used, the flood of messages can cause a hard lockup in
> the console code.
> 
> Introduce a flag indicating the message has already been logged
> for the device, and reset the flag when scsi_device_set_state()
> changes the device state.
> 
> Signed-off-by: Ewan D. Milne <emilne@redhat.com>
> ---
>  drivers/scsi/scsi_lib.c    | 8 ++++++--
>  include/scsi/scsi_device.h | 2 ++
>  2 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 610ee41..d3a6d97 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1240,8 +1240,11 @@ scsi_prep_state_check(struct scsi_device
> *sdev, struct request *req)
>  		 * commands.  The device must be brought online
>  		 * before trying any recovery commands.
>  		 */
> -		sdev_printk(KERN_ERR, sdev,
> -			    "rejecting I/O to offline device\n");
> +		if (!sdev->offline_already) {
> +			sdev->offline_already = 1;
> +			sdev_printk(KERN_ERR, sdev,
> +				    "rejecting I/O to offline
> device\n");
> +		}
>  		return BLK_STS_IOERR;
>  	case SDEV_DEL:
>  		/*
> @@ -2340,6 +2343,7 @@ scsi_device_set_state(struct scsi_device *sdev,
> enum scsi_device_state state)
>  		break;
>  
>  	}
> +	sdev->offline_already = 0;
>  	sdev->sdev_state = state;
>  	return 0;
>  
> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
> index f8312a3..72987a0 100644
> --- a/include/scsi/scsi_device.h
> +++ b/include/scsi/scsi_device.h
> @@ -204,6 +204,8 @@ struct scsi_device {
>  	unsigned unmap_limit_for_ws:1;	/* Use the UNMAP limit for
> WRITE SAME */
>  	unsigned rpm_autosuspend:1;	/* Enable runtime autosuspend
> at device
>  					 * creation time */
> +	unsigned offline_already:1;	/* Device offline message
> logged */
> +
>  	atomic_t disk_events_disable_depth; /* disable depth for disk
> events */
>  
>  	DECLARE_BITMAP(supported_events, SDEV_EVT_MAXBITS); /*
> supported events */

I tested this on a customer issue.
If its accepted please use

Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com> 



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-03-11 14:38 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-03-09 18:14 [PATCH] scsi: avoid repetitive logging of device offline messages Ewan D. Milne
2020-03-09 19:05 ` James Bottomley
2020-03-09 20:49   ` Ewan D. Milne
2020-03-10  3:58     ` Douglas Gilbert
2020-03-09 19:36 ` Bart Van Assche
2020-03-09 20:54   ` Ewan D. Milne
2020-03-10  2:02     ` Bart Van Assche
2020-03-10 16:44       ` Ewan D. Milne
2020-03-11  4:01         ` Bart Van Assche
2020-03-11  5:12           ` Douglas Gilbert
2020-03-11 14:38           ` Ewan D. Milne
2020-03-09 22:36 ` Laurence Oberman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.