Write cache and surface error behaviour

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* Write cache and surface error behaviour
@ 2014-07-20 21:54 joystick
  2014-07-21 14:55 ` Dale R. Worley
  2014-07-28 23:57 ` Jeremy Linton
  0 siblings, 2 replies; 5+ messages in thread
From: joystick @ 2014-07-20 21:54 UTC (permalink / raw)
  To: linux-scsi@vger.kernel.org

Hello list,
I don't really understand this disk cache thing.
Suppose a disk with write cache enabled of writeback type: Linux 
receives a write completed notification (a message from the disk) when 
the data has reached the cache of the disk. Correct? At that point it is 
not considered an in-flight I/O anymore. Correct?
So what happens when the disk tries to write it to the platter and 
discovers that there is a media error on that sector? (suppose 
relocation does not happen ; maybe sectors exhausted)
Does Linux receive the write error upon the next flush it issues? So the 
error is related to the flush? And what happens if Linux never issues 
such flush?

Thank you
J.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Write cache and surface error behaviour
  2014-07-20 21:54 Write cache and surface error behaviour joystick
@ 2014-07-21 14:55 ` Dale R. Worley
  2014-07-28 20:37   ` Dale R. Worley
  2014-07-28 23:57 ` Jeremy Linton
  1 sibling, 1 reply; 5+ messages in thread
From: Dale R. Worley @ 2014-07-21 14:55 UTC (permalink / raw)
  To: joystick; +Cc: linux-scsi

> From: joystick <joystick@shiftmail.org>
> 
> I don't really understand this disk cache thing.
> Suppose a disk with write cache enabled of writeback type: Linux 
> receives a write completed notification (a message from the disk) when 
> the data has reached the cache of the disk. Correct? At that point it is 
> not considered an in-flight I/O anymore. Correct?
> So what happens when the disk tries to write it to the platter and 
> discovers that there is a media error on that sector? (suppose 
> relocation does not happen ; maybe sectors exhausted)
> Does Linux receive the write error upon the next flush it issues? So the 
> error is related to the flush? And what happens if Linux never issues 
> such flush?

I'm no expert...  But modern disks play a lot of games beneath the
covers, including patching around bad blocks.  So in most cases, if
the first write shows a media error, the controller will assign a
replacement sector and write the data there.

Of course the risks you discuss still exist.  My guess is that if
people are really worried about their data, they don't use write-back
caching.  I notice that the startup messages of my system talk of
"writethrough" caching, which (IIRC) means that the kernel isn't told
that the write is complete until the controller can guarantee the data
is in non-volatile memory.

In the end, the biggest risk is that the entire disk will suddenly
become unreadable.

Dale

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Write cache and surface error behaviour
  2014-07-21 14:55 ` Dale R. Worley
@ 2014-07-28 20:37   ` Dale R. Worley
  0 siblings, 0 replies; 5+ messages in thread
From: Dale R. Worley @ 2014-07-28 20:37 UTC (permalink / raw)
  To: joystick, linux-scsi

> From: joystick <joystick@shiftmail.org>
> 
> I don't really understand this disk cache thing.
> Suppose a disk with write cache enabled of writeback type: Linux 
> receives a write completed notification (a message from the disk) when 
> the data has reached the cache of the disk. Correct? At that point it is 
> not considered an in-flight I/O anymore. Correct?
> So what happens when the disk tries to write it to the platter and 
> discovers that there is a media error on that sector? (suppose 
> relocation does not happen ; maybe sectors exhausted)
> Does Linux receive the write error upon the next flush it issues? So the 
> error is related to the flush? And what happens if Linux never issues 
> such flush?

Thinking about this some more, there's no particular reason for the
kernel to use write-back caching in the disk, since the kernel itself
maintains a write-back cache.  So the kernel can be quite patient
about when the disk writes the block to the platters.  Given all the
complexities of coordinating with a disk that is doing write-back
caching, it seems like it would be easier for the kernel to tell the
disk to write-through, that is, don't notify the kernel that the write
is done until the data is actually on the platter.  Most of the time,
the kernel can be patient about writing; the only time the kernel has
to care is when an application forces synchronization on the disk I/O
subsystem.  (I vaguely recall this is done only on close() or
fsync().)  And when that happens, the kernel has to know the data is
on the platter.

Dale

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Write cache and surface error behaviour
  2014-07-20 21:54 Write cache and surface error behaviour joystick
  2014-07-21 14:55 ` Dale R. Worley
@ 2014-07-28 23:57 ` Jeremy Linton
  2014-07-29  1:43   ` Douglas Gilbert
  1 sibling, 1 reply; 5+ messages in thread
From: Jeremy Linton @ 2014-07-28 23:57 UTC (permalink / raw)
  To: joystick, linux-scsi@vger.kernel.org

On 7/20/2014 4:54 PM, joystick wrote:
> So what happens when the disk tries to write it to the platter and 
> discovers that there is a media error on that sector? (suppose relocation
> does not happen ; maybe sectors exhausted) Does Linux receive the write
> error upon the next flush it issues? 

	At least for SCSI I believe the situation you describe is covered in the SCSI
specifications as a deferred error. Basically, the device returns a check
condition indicating a deferred failure in response to another command.

	My understanding (and I'm sure others can correct it) is that the device
server can issue these check conditions anytime it wants. The only guarantee is
that data written before the last successful SYNC is on the media (doesn't mean
you can read it!). So, in order to guarantee data is not lost, a system using
writeback should retain all of the writeback data until a successful SYNC CACHE
operation.

	For example see, SPC4 4.5.7 note 6.

	If you consider what happens during power loss to a write-back cache, its the
same situation. Bottom line, make sure to issue sync's for data you want to
retain and use a filesystem/device that supports barriers and SYNC CACHE/CACHE
FLUSH correctly. Still YMMV.
















^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Write cache and surface error behaviour
  2014-07-28 23:57 ` Jeremy Linton
@ 2014-07-29  1:43   ` Douglas Gilbert
  0 siblings, 0 replies; 5+ messages in thread
From: Douglas Gilbert @ 2014-07-29  1:43 UTC (permalink / raw)
  To: Jeremy Linton, joystick, linux-scsi@vger.kernel.org

On 14-07-28 04:57 PM, Jeremy Linton wrote:
> On 7/20/2014 4:54 PM, joystick wrote:
>> So what happens when the disk tries to write it to the platter and
>> discovers that there is a media error on that sector? (suppose relocation
>> does not happen ; maybe sectors exhausted) Does Linux receive the write
>> error upon the next flush it issues?
>
> 	At least for SCSI I believe the situation you describe is covered in the SCSI
> specifications as a deferred error. Basically, the device returns a check
> condition indicating a deferred failure in response to another command.
>
> 	My understanding (and I'm sure others can correct it) is that the device
> server can issue these check conditions anytime it wants. The only guarantee is
> that data written before the last successful SYNC is on the media (doesn't mean
> you can read it!). So, in order to guarantee data is not lost, a system using
> writeback should retain all of the writeback data until a successful SYNC CACHE
> operation.
>
> 	For example see, SPC4 4.5.7 note 6.
>
> 	If you consider what happens during power loss to a write-back cache, its the
> same situation. Bottom line, make sure to issue sync's for data you want to
> retain and use a filesystem/device that supports barriers and SYNC CACHE/CACHE
> FLUSH correctly. Still YMMV.

Another possibility is to use the WRITE AND VERIFY commands
which have 10, 16 and 32 byte variants. They always write to
the medium and never (as far as I can see) lead to deferred
errors being generated for some subsequent commands. So if
the write "goes bad" and can't be re-assigned to another
part of the medium (because that is disallowed or resources are
depleted) then the application client will be informed in the
status and sense data for the offending WRITE AND VERIFY.

So a remaining issue is how well the WRITE AND VERIFY command
is supported by modern disks and RAIDs.

Doug Gilbert







^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-07-29  1:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-20 21:54 Write cache and surface error behaviour joystick
2014-07-21 14:55 ` Dale R. Worley
2014-07-28 20:37   ` Dale R. Worley
2014-07-28 23:57 ` Jeremy Linton
2014-07-29  1:43   ` Douglas Gilbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox