* Re: raid5-cache I/O path improvements [not found] ` <20150908002840.GA3196542@devbig257.prn2.facebook.com> @ 2015-09-08 6:12 ` Christoph Hellwig 2015-09-08 15:25 ` Tejun Heo 2015-09-08 16:56 ` Shaohua Li 0 siblings, 2 replies; 9+ messages in thread From: Christoph Hellwig @ 2015-09-08 6:12 UTC (permalink / raw) To: Shaohua Li Cc: neilb, linux-raid, Kernel-team, dan.j.williams, Tejun Heo, Martin K. Petersen, linux-ide On Mon, Sep 07, 2015 at 05:28:55PM -0700, Shaohua Li wrote: > Hi Christoph, > Thanks for these work. Yes, I/O error handling is in the plan. We could > simplify panic (people here like this option) or report error and bypass > log. Any way an option is good. I think the sensible thing in general is to fail the I/O. Once we have a cache devie the assumption is that a) write holes are properly handled, and we b) do all kinds of optimizations based on the presensce of the log device like not passing through flush requests or skippign resync. Having the cache device suddenly disappear will alwasy break a) and require a lot of hairy code only used in failure cases to undo the rest. > For the patches, FUA write does simplify things a lot. However, I tried > it before, the performance is quite bad in SSD. FUA is off in SATA by > default, the emulation is farily slow because FLUSH request isn't NCQ > command. I tried to enable FUA in SATA too, FUA write is still slow in > the SSD I tested. Other than this one, other patches look good: Pretty much every SSD (and modern disk driver) supports FUA. Please benchmark with libata.fua=Y, as I think the simplifcation is absolutely worth it. On my SSDs using it gives far lower latency for writes, nevermind nvmdimm where it's also essential as the flush statemchine increases the write latency by an order of magnitude. Tejun, do you have any updates on libata vs FUA? We onable it by default for a while in 2012, but then Jeff reverted it with a rather non-descriptive commit message. Also NVMe or SAS SSDs will benefit heavily from the FUA bit. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid5-cache I/O path improvements 2015-09-08 6:12 ` raid5-cache I/O path improvements Christoph Hellwig @ 2015-09-08 15:25 ` Tejun Heo 2015-09-08 15:26 ` Tejun Heo 2015-09-08 15:40 ` Christoph Hellwig 2015-09-08 16:56 ` Shaohua Li 1 sibling, 2 replies; 9+ messages in thread From: Tejun Heo @ 2015-09-08 15:25 UTC (permalink / raw) To: Christoph Hellwig Cc: Shaohua Li, neilb, linux-raid, Kernel-team, dan.j.williams, Martin K. Petersen, linux-ide Hello, Christoph. On Tue, Sep 08, 2015 at 08:12:15AM +0200, Christoph Hellwig wrote: > Tejun, do you have any updates on libata vs FUA? We onable it > by default for a while in 2012, but then Jeff reverted it with a rather > non-descriptive commit message. IIRC, some controllers and/or controllers were choking on it and it didn't make any noticeable difference on rotating disks. Maybe we can try again with controller white list and enabling by default on SSds. Thanks. -- tejun ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid5-cache I/O path improvements 2015-09-08 15:25 ` Tejun Heo @ 2015-09-08 15:26 ` Tejun Heo 2015-09-08 15:40 ` Christoph Hellwig 1 sibling, 0 replies; 9+ messages in thread From: Tejun Heo @ 2015-09-08 15:26 UTC (permalink / raw) To: Christoph Hellwig Cc: Shaohua Li, neilb, linux-raid, Kernel-team, dan.j.williams, Martin K. Petersen, linux-ide On Tue, Sep 08, 2015 at 11:25:46AM -0400, Tejun Heo wrote: ... > IIRC, some controllers and/or controllers were choking on it and it ^ drives > didn't make any noticeable difference on rotating disks. Maybe we can > try again with controller white list and enabling by default on SSds. -- tejun ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid5-cache I/O path improvements 2015-09-08 15:25 ` Tejun Heo 2015-09-08 15:26 ` Tejun Heo @ 2015-09-08 15:40 ` Christoph Hellwig 1 sibling, 0 replies; 9+ messages in thread From: Christoph Hellwig @ 2015-09-08 15:40 UTC (permalink / raw) To: Tejun Heo Cc: Shaohua Li, neilb, linux-raid, Kernel-team, dan.j.williams, Martin K. Petersen, linux-ide On Tue, Sep 08, 2015 at 11:25:46AM -0400, Tejun Heo wrote: > IIRC, some controllers and/or controllers were choking on it and it > didn't make any noticeable difference on rotating disks. Maybe we can > try again with controller white list and enabling by default on SSds. I guess we could start with AHCI only as a good approximation for a not too crappy controller and driver. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid5-cache I/O path improvements 2015-09-08 6:12 ` raid5-cache I/O path improvements Christoph Hellwig 2015-09-08 15:25 ` Tejun Heo @ 2015-09-08 16:56 ` Shaohua Li 2015-09-08 17:02 ` Tejun Heo 1 sibling, 1 reply; 9+ messages in thread From: Shaohua Li @ 2015-09-08 16:56 UTC (permalink / raw) To: Christoph Hellwig Cc: neilb, linux-raid, Kernel-team, dan.j.williams, Tejun Heo, Martin K. Petersen, linux-ide On Tue, Sep 08, 2015 at 08:12:15AM +0200, Christoph Hellwig wrote: > On Mon, Sep 07, 2015 at 05:28:55PM -0700, Shaohua Li wrote: > > Hi Christoph, > > Thanks for these work. Yes, I/O error handling is in the plan. We could > > simplify panic (people here like this option) or report error and bypass > > log. Any way an option is good. > > I think the sensible thing in general is to fail the I/O. Once we have > a cache devie the assumption is that a) write holes are properly handled, > and we b) do all kinds of optimizations based on the presensce of the > log device like not passing through flush requests or skippign resync. > > Having the cache device suddenly disappear will alwasy break a) and > require a lot of hairy code only used in failure cases to undo the > rest. Failing the I/O is ok too. > > For the patches, FUA write does simplify things a lot. However, I tried > > it before, the performance is quite bad in SSD. FUA is off in SATA by > > default, the emulation is farily slow because FLUSH request isn't NCQ > > command. I tried to enable FUA in SATA too, FUA write is still slow in > > the SSD I tested. Other than this one, other patches look good: > > Pretty much every SSD (and modern disk driver) supports FUA. Please > benchmark with libata.fua=Y, as I think the simplifcation is absolutely > worth it. On my SSDs using it gives far lower latency for writes, > nevermind nvmdimm where it's also essential as the flush statemchine > increases the write latency by an order of magnitude. > > Tejun, do you have any updates on libata vs FUA? We onable it > by default for a while in 2012, but then Jeff reverted it with a rather > non-descriptive commit message. > > Also NVMe or SAS SSDs will benefit heavily from the FUA bit. I agree the benefit of FUA. In the system I'm testing, an Intel ssd supports FUA, a sandisk SSD doesn't support FUA (this is the SSD we will deploy for the log). This is AHCI with libata.fua=1. FUA isn't supported by every SSD. If the log uses FUA by default, we will have a lot of FUA write and performance is impacted. I'll benchmark on a SSD from another vendor, which supports FUA, but FUA write has poor performance in my last test. Thanks, Shaohua ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid5-cache I/O path improvements 2015-09-08 16:56 ` Shaohua Li @ 2015-09-08 17:02 ` Tejun Heo 2015-09-08 17:07 ` Shaohua Li 0 siblings, 1 reply; 9+ messages in thread From: Tejun Heo @ 2015-09-08 17:02 UTC (permalink / raw) To: Shaohua Li Cc: Christoph Hellwig, neilb, linux-raid, Kernel-team, dan.j.williams, Martin K. Petersen, linux-ide Hello, On Tue, Sep 08, 2015 at 09:56:22AM -0700, Shaohua Li wrote: > I'll benchmark on a SSD from another vendor, which supports FUA, but FUA > write has poor performance in my last test. lolwut? Was it slower than write + flush? thanks. -- tejun ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid5-cache I/O path improvements 2015-09-08 17:02 ` Tejun Heo @ 2015-09-08 17:07 ` Shaohua Li 2015-09-08 17:34 ` Tejun Heo 0 siblings, 1 reply; 9+ messages in thread From: Shaohua Li @ 2015-09-08 17:07 UTC (permalink / raw) To: Tejun Heo Cc: Christoph Hellwig, neilb, linux-raid, Kernel-team, dan.j.williams, Martin K. Petersen, linux-ide On Tue, Sep 08, 2015 at 01:02:26PM -0400, Tejun Heo wrote: > Hello, > > On Tue, Sep 08, 2015 at 09:56:22AM -0700, Shaohua Li wrote: > > I'll benchmark on a SSD from another vendor, which supports FUA, but FUA > > write has poor performance in my last test. > > lolwut? Was it slower than write + flush? I need double confirm. But for write + flush, we aggregrate several writes and do a flush; for FUA, we do every meta write with FUA. So this is not apple to apple comparison. Thanks, Shaohua ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid5-cache I/O path improvements 2015-09-08 17:07 ` Shaohua Li @ 2015-09-08 17:34 ` Tejun Heo 2015-09-09 15:59 ` Christoph Hellwig 0 siblings, 1 reply; 9+ messages in thread From: Tejun Heo @ 2015-09-08 17:34 UTC (permalink / raw) To: Shaohua Li Cc: Christoph Hellwig, neilb, linux-raid, Kernel-team, dan.j.williams, Martin K. Petersen, linux-ide Hello, On Tue, Sep 08, 2015 at 10:07:36AM -0700, Shaohua Li wrote: > I need double confirm. But for write + flush, we aggregrate several > writes and do a flush; for FUA, we do every meta write with FUA. So this > is not apple to apple comparison. Does that mean that upper layers are taking different actions depending on whether the underlying device supports FUA? That at least wasn't the original model that I had on mind when implementing the current incarnation of REQ_FLUSH and FUA. The only difference FUA was expected to make was optimizing out the flush after REQ_FUA and the upper layers were expected to issue REQ_FLUSH/FUA the same whether the device supports FUA or not. Hmmm... grep tells me that dm and md actually are branching on whether the underlying device supports FUA. This is tricky. I didn't even mean flush_flags to be used directly by upper layers. For rotational devices, doing multiple FUAs compared multiple writes followed by REQ_FLUSH is probably a lot worse - the head gets moved multiple times likely skipping over data which can be written out while traversing and it's not like stalling write pipeline and draining write queue has much impact on hard drives. Maybe it's different on SSDs. I'm not sure about how expensive flush itself would be given that a lot of write cost is paid asynchronously anyway during gc but flush stalls IO pipeline and that could be very noticeable on high iops devices. Also, unless the implementation is braindead FUA IOs are unlikely to be expensive on SSDs, so maybe what we should do is making block layer hint upper layers regarding what's likely to perform better. But, ultimately, I don't think it'd be too difficult for high-depth SSD devices to report write-through w/o losing any performance one way or the other and it'd be great if we eventually can get there. Thanks. -- tejun ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid5-cache I/O path improvements 2015-09-08 17:34 ` Tejun Heo @ 2015-09-09 15:59 ` Christoph Hellwig 0 siblings, 0 replies; 9+ messages in thread From: Christoph Hellwig @ 2015-09-09 15:59 UTC (permalink / raw) To: Tejun Heo Cc: Shaohua Li, Christoph Hellwig, neilb, linux-raid, Kernel-team, dan.j.williams, Martin K. Petersen, linux-ide On Tue, Sep 08, 2015 at 01:34:20PM -0400, Tejun Heo wrote: > Hmmm... grep tells me that dm and md actually are branching on whether > the underlying device supports FUA. This is tricky. I didn't even > mean flush_flags to be used directly by upper layers. For rotational > devices, doing multiple FUAs compared multiple writes followed by > REQ_FLUSH is probably a lot worse - the head gets moved multiple times > likely skipping over data which can be written out while traversing > and it's not like stalling write pipeline and draining write queue has > much impact on hard drives. Well, that's what we'd need to do for the raid cache as well, given the resulst that Shaohua sees. Unless you have a good idea for another way to handle the issue we'll need to support both behaviors there. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-09-09 15:59 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1441603250-5119-1-git-send-email-hch@lst.de>
[not found] ` <20150908002840.GA3196542@devbig257.prn2.facebook.com>
2015-09-08 6:12 ` raid5-cache I/O path improvements Christoph Hellwig
2015-09-08 15:25 ` Tejun Heo
2015-09-08 15:26 ` Tejun Heo
2015-09-08 15:40 ` Christoph Hellwig
2015-09-08 16:56 ` Shaohua Li
2015-09-08 17:02 ` Tejun Heo
2015-09-08 17:07 ` Shaohua Li
2015-09-08 17:34 ` Tejun Heo
2015-09-09 15:59 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).