Re: raid5-cache I/O path improvements

linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: raid5-cache I/O path improvements
       [not found] ` <20150908002840.GA3196542@devbig257.prn2.facebook.com>
@ 2015-09-08  6:12   ` Christoph Hellwig
  2015-09-08 15:25     ` Tejun Heo
  2015-09-08 16:56     ` Shaohua Li
  0 siblings, 2 replies; 9+ messages in thread
From: Christoph Hellwig @ 2015-09-08  6:12 UTC (permalink / raw)
  To: Shaohua Li
  Cc: neilb, linux-raid, Kernel-team, dan.j.williams, Tejun Heo,
	Martin K. Petersen, linux-ide

On Mon, Sep 07, 2015 at 05:28:55PM -0700, Shaohua Li wrote:
> Hi Christoph,
> Thanks for these work. Yes, I/O error handling is in the plan. We could
> simplify panic (people here like this option) or report error and bypass
> log. Any way an option is good.

I think the sensible thing in general is to fail the I/O.  Once we have
a cache devie the assumption is that a) write holes are properly handled,
and we b) do all kinds of optimizations based on the presensce of the
log device like not passing through flush requests or skippign resync.

Having the cache device suddenly disappear will alwasy break a) and
require a lot of hairy code only used in failure cases to undo the
rest.

> For the patches, FUA write does simplify things a lot. However, I tried
> it before, the performance is quite bad in SSD. FUA is off in SATA by
> default, the emulation is farily slow because FLUSH request isn't NCQ
> command. I tried to enable FUA in SATA too, FUA write is still slow in
> the SSD I tested. Other than this one, other patches look good:

Pretty much every SSD (and modern disk driver) supports FUA.  Please
benchmark with libata.fua=Y, as I think the simplifcation is absolutely
worth it.  On my SSDs using it gives far lower latency for writes,
nevermind nvmdimm where it's also essential as the flush statemchine
increases the write latency by an order of magnitude.

Tejun, do you have any updates on libata vs FUA?  We onable it
by default for a while in 2012, but then Jeff reverted it with a rather
non-descriptive commit message.

Also NVMe or SAS SSDs will benefit heavily from the FUA bit.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid5-cache I/O path improvements
  2015-09-08  6:12   ` raid5-cache I/O path improvements Christoph Hellwig
@ 2015-09-08 15:25     ` Tejun Heo
  2015-09-08 15:26       ` Tejun Heo
  2015-09-08 15:40       ` Christoph Hellwig
  2015-09-08 16:56     ` Shaohua Li
  1 sibling, 2 replies; 9+ messages in thread
From: Tejun Heo @ 2015-09-08 15:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Shaohua Li, neilb, linux-raid, Kernel-team, dan.j.williams,
	Martin K. Petersen, linux-ide

Hello, Christoph.

On Tue, Sep 08, 2015 at 08:12:15AM +0200, Christoph Hellwig wrote:
> Tejun, do you have any updates on libata vs FUA?  We onable it
> by default for a while in 2012, but then Jeff reverted it with a rather
> non-descriptive commit message.

IIRC, some controllers and/or controllers were choking on it and it
didn't make any noticeable difference on rotating disks.  Maybe we can
try again with controller white list and enabling by default on SSds.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid5-cache I/O path improvements
  2015-09-08 15:25     ` Tejun Heo
@ 2015-09-08 15:26       ` Tejun Heo
  2015-09-08 15:40       ` Christoph Hellwig
  1 sibling, 0 replies; 9+ messages in thread
From: Tejun Heo @ 2015-09-08 15:26 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Shaohua Li, neilb, linux-raid, Kernel-team, dan.j.williams,
	Martin K. Petersen, linux-ide

On Tue, Sep 08, 2015 at 11:25:46AM -0400, Tejun Heo wrote:
...
> IIRC, some controllers and/or controllers were choking on it and it
                                ^
				drives

> didn't make any noticeable difference on rotating disks.  Maybe we can
> try again with controller white list and enabling by default on SSds.

-- 
tejun

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid5-cache I/O path improvements
  2015-09-08 15:25     ` Tejun Heo
  2015-09-08 15:26       ` Tejun Heo
@ 2015-09-08 15:40       ` Christoph Hellwig
  1 sibling, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2015-09-08 15:40 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Shaohua Li, neilb, linux-raid, Kernel-team, dan.j.williams,
	Martin K. Petersen, linux-ide

On Tue, Sep 08, 2015 at 11:25:46AM -0400, Tejun Heo wrote:
> IIRC, some controllers and/or controllers were choking on it and it
> didn't make any noticeable difference on rotating disks.  Maybe we can
> try again with controller white list and enabling by default on SSds.

I guess we could start with AHCI only as a good approximation for a not
too crappy controller and driver.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid5-cache I/O path improvements
  2015-09-08  6:12   ` raid5-cache I/O path improvements Christoph Hellwig
  2015-09-08 15:25     ` Tejun Heo
@ 2015-09-08 16:56     ` Shaohua Li
  2015-09-08 17:02       ` Tejun Heo
  1 sibling, 1 reply; 9+ messages in thread
From: Shaohua Li @ 2015-09-08 16:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: neilb, linux-raid, Kernel-team, dan.j.williams, Tejun Heo,
	Martin K. Petersen, linux-ide

On Tue, Sep 08, 2015 at 08:12:15AM +0200, Christoph Hellwig wrote:
> On Mon, Sep 07, 2015 at 05:28:55PM -0700, Shaohua Li wrote:
> > Hi Christoph,
> > Thanks for these work. Yes, I/O error handling is in the plan. We could
> > simplify panic (people here like this option) or report error and bypass
> > log. Any way an option is good.
> 
> I think the sensible thing in general is to fail the I/O.  Once we have
> a cache devie the assumption is that a) write holes are properly handled,
> and we b) do all kinds of optimizations based on the presensce of the
> log device like not passing through flush requests or skippign resync.
> 
> Having the cache device suddenly disappear will alwasy break a) and
> require a lot of hairy code only used in failure cases to undo the
> rest.

Failing the I/O is ok too.
 
> > For the patches, FUA write does simplify things a lot. However, I tried
> > it before, the performance is quite bad in SSD. FUA is off in SATA by
> > default, the emulation is farily slow because FLUSH request isn't NCQ
> > command. I tried to enable FUA in SATA too, FUA write is still slow in
> > the SSD I tested. Other than this one, other patches look good:
> 
> Pretty much every SSD (and modern disk driver) supports FUA.  Please
> benchmark with libata.fua=Y, as I think the simplifcation is absolutely
> worth it.  On my SSDs using it gives far lower latency for writes,
> nevermind nvmdimm where it's also essential as the flush statemchine
> increases the write latency by an order of magnitude.
> 
> Tejun, do you have any updates on libata vs FUA?  We onable it
> by default for a while in 2012, but then Jeff reverted it with a rather
> non-descriptive commit message.
> 
> Also NVMe or SAS SSDs will benefit heavily from the FUA bit.

I agree the benefit of FUA. In the system I'm testing, an Intel ssd
supports FUA, a sandisk SSD doesn't support FUA (this is the SSD we will
deploy for the log). This is AHCI with libata.fua=1. FUA isn't supported
by every SSD. If the log uses FUA by default, we will have a lot of FUA
write and performance is impacted.

I'll benchmark on a SSD from another vendor, which supports FUA, but FUA
write has poor performance in my last test.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid5-cache I/O path improvements
  2015-09-08 16:56     ` Shaohua Li
@ 2015-09-08 17:02       ` Tejun Heo
  2015-09-08 17:07         ` Shaohua Li
  0 siblings, 1 reply; 9+ messages in thread
From: Tejun Heo @ 2015-09-08 17:02 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Christoph Hellwig, neilb, linux-raid, Kernel-team, dan.j.williams,
	Martin K. Petersen, linux-ide

Hello,

On Tue, Sep 08, 2015 at 09:56:22AM -0700, Shaohua Li wrote:
> I'll benchmark on a SSD from another vendor, which supports FUA, but FUA
> write has poor performance in my last test.

lolwut?  Was it slower than write + flush?

thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid5-cache I/O path improvements
  2015-09-08 17:02       ` Tejun Heo
@ 2015-09-08 17:07         ` Shaohua Li
  2015-09-08 17:34           ` Tejun Heo
  0 siblings, 1 reply; 9+ messages in thread
From: Shaohua Li @ 2015-09-08 17:07 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Christoph Hellwig, neilb, linux-raid, Kernel-team, dan.j.williams,
	Martin K. Petersen, linux-ide

On Tue, Sep 08, 2015 at 01:02:26PM -0400, Tejun Heo wrote:
> Hello,
> 
> On Tue, Sep 08, 2015 at 09:56:22AM -0700, Shaohua Li wrote:
> > I'll benchmark on a SSD from another vendor, which supports FUA, but FUA
> > write has poor performance in my last test.
> 
> lolwut?  Was it slower than write + flush?

I need double confirm. But for write + flush, we aggregrate several
writes and do a flush; for FUA, we do every meta write with FUA. So this
is not apple to apple comparison.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid5-cache I/O path improvements
  2015-09-08 17:07         ` Shaohua Li
@ 2015-09-08 17:34           ` Tejun Heo
  2015-09-09 15:59             ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: Tejun Heo @ 2015-09-08 17:34 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Christoph Hellwig, neilb, linux-raid, Kernel-team, dan.j.williams,
	Martin K. Petersen, linux-ide

Hello,

On Tue, Sep 08, 2015 at 10:07:36AM -0700, Shaohua Li wrote:
> I need double confirm. But for write + flush, we aggregrate several
> writes and do a flush; for FUA, we do every meta write with FUA. So this
> is not apple to apple comparison.

Does that mean that upper layers are taking different actions
depending on whether the underlying device supports FUA?  That at
least wasn't the original model that I had on mind when implementing
the current incarnation of REQ_FLUSH and FUA.  The only difference FUA
was expected to make was optimizing out the flush after REQ_FUA and
the upper layers were expected to issue REQ_FLUSH/FUA the same whether
the device supports FUA or not.

Hmmm... grep tells me that dm and md actually are branching on whether
the underlying device supports FUA.  This is tricky.  I didn't even
mean flush_flags to be used directly by upper layers.  For rotational
devices, doing multiple FUAs compared multiple writes followed by
REQ_FLUSH is probably a lot worse - the head gets moved multiple times
likely skipping over data which can be written out while traversing
and it's not like stalling write pipeline and draining write queue has
much impact on hard drives.

Maybe it's different on SSDs.  I'm not sure about how expensive flush
itself would be given that a lot of write cost is paid asynchronously
anyway during gc but flush stalls IO pipeline and that could be very
noticeable on high iops devices.  Also, unless the implementation is
braindead FUA IOs are unlikely to be expensive on SSDs, so maybe what
we should do is making block layer hint upper layers regarding what's
likely to perform better.

But, ultimately, I don't think it'd be too difficult for high-depth
SSD devices to report write-through w/o losing any performance one way
or the other and it'd be great if we eventually can get there.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid5-cache I/O path improvements
  2015-09-08 17:34           ` Tejun Heo
@ 2015-09-09 15:59             ` Christoph Hellwig
  0 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2015-09-09 15:59 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Shaohua Li, Christoph Hellwig, neilb, linux-raid, Kernel-team,
	dan.j.williams, Martin K. Petersen, linux-ide

On Tue, Sep 08, 2015 at 01:34:20PM -0400, Tejun Heo wrote:
> Hmmm... grep tells me that dm and md actually are branching on whether
> the underlying device supports FUA.  This is tricky.  I didn't even
> mean flush_flags to be used directly by upper layers.  For rotational
> devices, doing multiple FUAs compared multiple writes followed by
> REQ_FLUSH is probably a lot worse - the head gets moved multiple times
> likely skipping over data which can be written out while traversing
> and it's not like stalling write pipeline and draining write queue has
> much impact on hard drives.

Well, that's what we'd need to do for the raid cache as well, given
the resulst that Shaohua sees.  Unless you have a good idea for another
way to handle the issue we'll need to support both behaviors there.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-09-09 15:59 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1441603250-5119-1-git-send-email-hch@lst.de>
     [not found] ` <20150908002840.GA3196542@devbig257.prn2.facebook.com>
2015-09-08  6:12   ` raid5-cache I/O path improvements Christoph Hellwig
2015-09-08 15:25     ` Tejun Heo
2015-09-08 15:26       ` Tejun Heo
2015-09-08 15:40       ` Christoph Hellwig
2015-09-08 16:56     ` Shaohua Li
2015-09-08 17:02       ` Tejun Heo
2015-09-08 17:07         ` Shaohua Li
2015-09-08 17:34           ` Tejun Heo
2015-09-09 15:59             ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).