* bug in md write barrier support? @ 2004-09-03 17:24 Christoph Hellwig 2004-09-04 0:56 ` Neil Brown 0 siblings, 1 reply; 14+ messages in thread From: Christoph Hellwig @ 2004-09-03 17:24 UTC (permalink / raw) To: neilb, axboe; +Cc: linux-kernel md_flush_mddev just passes on the sector relative to the raid device, shouldn't it be translated somewhere? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bug in md write barrier support? 2004-09-03 17:24 bug in md write barrier support? Christoph Hellwig @ 2004-09-04 0:56 ` Neil Brown 2004-09-04 8:21 ` Jens Axboe 0 siblings, 1 reply; 14+ messages in thread From: Neil Brown @ 2004-09-04 0:56 UTC (permalink / raw) To: Christoph Hellwig; +Cc: axboe, linux-kernel On Friday September 3, hch@lst.de wrote: > md_flush_mddev just passes on the sector relative to the raid device, > shouldn't it be translated somewhere? Yes. md_flush_mddev should simply be removed. The functionality should be, and largely is, in the individual personalities. Is there documentation somewhere on exactly what an issue_flush_fn should do (is it allowed to sleep? what must happen before it is allowed to return, what is the "error_sector" for, that sort of thing). I suspect that at least raid5 will need some fairly special handling. NeilBrown ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bug in md write barrier support? 2004-09-04 0:56 ` Neil Brown @ 2004-09-04 8:21 ` Jens Axboe 2004-09-06 1:36 ` Neil Brown 0 siblings, 1 reply; 14+ messages in thread From: Jens Axboe @ 2004-09-04 8:21 UTC (permalink / raw) To: Neil Brown; +Cc: Christoph Hellwig, linux-kernel On Sat, Sep 04 2004, Neil Brown wrote: > On Friday September 3, hch@lst.de wrote: > > md_flush_mddev just passes on the sector relative to the raid device, > > shouldn't it be translated somewhere? > > Yes. md_flush_mddev should simply be removed. > The functionality should be, and largely is, in the individual > personalities. Yes, sorry I was a little lazy there even though I followed the plugging conversion :( > Is there documentation somewhere on exactly what an issue_flush_fn > should do (is it allowed to sleep? what must happen before it is > allowed to return, what is the "error_sector" for, that sort of thing). It is allowed to sleep, you should return when the flush is complete. error_sector is the failed location, which really should be a dev,sector tupple. -- Jens Axboe ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bug in md write barrier support? 2004-09-04 8:21 ` Jens Axboe @ 2004-09-06 1:36 ` Neil Brown 2004-09-08 9:23 ` Jens Axboe 0 siblings, 1 reply; 14+ messages in thread From: Neil Brown @ 2004-09-06 1:36 UTC (permalink / raw) To: Jens Axboe; +Cc: Christoph Hellwig, linux-kernel On Saturday September 4, axboe@suse.de wrote: > On Sat, Sep 04 2004, Neil Brown wrote: > > On Friday September 3, hch@lst.de wrote: > > > md_flush_mddev just passes on the sector relative to the raid device, > > > shouldn't it be translated somewhere? > > > > Yes. md_flush_mddev should simply be removed. > > The functionality should be, and largely is, in the individual > > personalities. > > Yes, sorry I was a little lazy there even though I followed the plugging > conversion :( > > > Is there documentation somewhere on exactly what an issue_flush_fn > > should do (is it allowed to sleep? what must happen before it is > > allowed to return, what is the "error_sector" for, that sort of thing). > > It is allowed to sleep, you should return when the flush is complete. > error_sector is the failed location, which really should be a dev,sector > tupple. Could I get a little more information about this function please. I've read through the code, and there isn't much in the way of examples to follow: only reiserfs uses it, only scsi-disk and ide-disk supports it (I think). It would seem that this is for write requests where b_end_io has already been called, indicating that the data is safe, but that maybe the data isn't really safe after-all, and blk_issue_flush needs to be called. I would have thought that after b_end_io is called, that data should be safe anyway. Not so? How do you tell a device: it is OK to just leave the data is cache, I'll call blk_issue_flush when I want it safe. Is this related to barriers are all?? NeilBrown ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bug in md write barrier support? 2004-09-06 1:36 ` Neil Brown @ 2004-09-08 9:23 ` Jens Axboe 2004-09-08 13:35 ` Alan Cox 0 siblings, 1 reply; 14+ messages in thread From: Jens Axboe @ 2004-09-08 9:23 UTC (permalink / raw) To: Neil Brown; +Cc: Christoph Hellwig, linux-kernel On Mon, Sep 06 2004, Neil Brown wrote: > On Saturday September 4, axboe@suse.de wrote: > > On Sat, Sep 04 2004, Neil Brown wrote: > > > On Friday September 3, hch@lst.de wrote: > > > > md_flush_mddev just passes on the sector relative to the raid device, > > > > shouldn't it be translated somewhere? > > > > > > Yes. md_flush_mddev should simply be removed. > > > The functionality should be, and largely is, in the individual > > > personalities. > > > > Yes, sorry I was a little lazy there even though I followed the plugging > > conversion :( > > > > > Is there documentation somewhere on exactly what an issue_flush_fn > > > should do (is it allowed to sleep? what must happen before it is > > > allowed to return, what is the "error_sector" for, that sort of thing). > > > > It is allowed to sleep, you should return when the flush is complete. > > error_sector is the failed location, which really should be a dev,sector > > tupple. > > Could I get a little more information about this function please. > I've read through the code, and there isn't much in the way of > examples to follow: only reiserfs uses it, only scsi-disk and ide-disk > supports it (I think). That is correct. The current definition is to ensure that previously sent writes are on disk. I hope to tie a range to it in the future, for devices that can optimize the flush in that case. So for ide with write back caching, it's currently a FLUSH_CACHE command. Ditto for SCSI. SCSI with write through cache can make it a noop as well. > It would seem that this is for write requests where b_end_io has already > been called, indicating that the data is safe, but that maybe the data > isn't really safe after-all, and blk_issue_flush needs to be called. Right on. > I would have thought that after b_end_io is called, that data should > be safe anyway. Not so? Not necessarily, if you have write caching enabled. > How do you tell a device: it is OK to just leave the data is cache, > I'll call blk_issue_flush when I want it safe. How would md know? The lower level driver knows what to do (if anything) to ensure the data is safe. > Is this related to barriers are all?? Yes and no. Currently it's used to fsync(), but can be used for anything where you want to insert a flush point without having a piece of data to tie it to. -- Jens Axboe ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bug in md write barrier support? 2004-09-08 9:23 ` Jens Axboe @ 2004-09-08 13:35 ` Alan Cox 2004-09-08 15:46 ` Jens Axboe 0 siblings, 1 reply; 14+ messages in thread From: Alan Cox @ 2004-09-08 13:35 UTC (permalink / raw) To: Jens Axboe; +Cc: Neil Brown, Christoph Hellwig, Linux Kernel Mailing List On Mer, 2004-09-08 at 10:23, Jens Axboe wrote: > That is correct. The current definition is to ensure that previously > sent writes are on disk. I hope to tie a range to it in the future, for > devices that can optimize the flush in that case. So for ide with write > back caching, it's currently a FLUSH_CACHE command. Ditto for SCSI. SCSI > with write through cache can make it a noop as well. Some semantics questions I have thinking about it from the I2O and aacraid side: You talk about it as a barrier. Can other I/O cross the cache flush ? In other words if I issue a flush_cache and continue doing I/O the flush will finish when the I/O outstanding at that time has completed but other I/O may get scheduled to disk first. Secondly what are the intended semantics for a flush error ? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bug in md write barrier support? 2004-09-08 13:35 ` Alan Cox @ 2004-09-08 15:46 ` Jens Axboe 2004-09-08 22:21 ` Alan Cox 0 siblings, 1 reply; 14+ messages in thread From: Jens Axboe @ 2004-09-08 15:46 UTC (permalink / raw) To: Alan Cox; +Cc: Neil Brown, Christoph Hellwig, Linux Kernel Mailing List On Wed, Sep 08 2004, Alan Cox wrote: > On Mer, 2004-09-08 at 10:23, Jens Axboe wrote: > > That is correct. The current definition is to ensure that previously > > sent writes are on disk. I hope to tie a range to it in the future, for > > devices that can optimize the flush in that case. So for ide with write > > back caching, it's currently a FLUSH_CACHE command. Ditto for SCSI. SCSI > > with write through cache can make it a noop as well. > > Some semantics questions I have thinking about it from the I2O and > aacraid side: You talk about it as a barrier. Can other I/O cross the > cache flush ? In other words if I issue a flush_cache and continue doing > I/O the flush will finish when the I/O outstanding at that time has > completed but other I/O may get scheduled to disk first. That's a worry if it really does that - does it, or are you just speculating about possible problems? > Secondly what are the intended semantics for a flush error ? It's up to the issue. For IDE it would ideally be issuing FLUSH_CACHE repeatedly until it doesn't error anymore, but keeping track of the error location. Come to think of it, we should pass down the range right now to flag which range we are actually interested in being errored on. -- Jens Axboe ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bug in md write barrier support? 2004-09-08 15:46 ` Jens Axboe @ 2004-09-08 22:21 ` Alan Cox 2004-09-09 8:06 ` Jens Axboe 2004-09-12 17:13 ` Rogier Wolff 0 siblings, 2 replies; 14+ messages in thread From: Alan Cox @ 2004-09-08 22:21 UTC (permalink / raw) To: Jens Axboe; +Cc: Neil Brown, Christoph Hellwig, Linux Kernel Mailing List On Mer, 2004-09-08 at 16:46, Jens Axboe wrote: > That's a worry if it really does that - does it, or are you just > speculating about possible problems? I2O defines cache flush very losely. It flushes the cache and returns when the cache has been flushed. From playing with the controllers I have it seems some at least merge further queued writes into the output stream. Thus if I issue write 1, 2, 3, 4 , 40, 41, flush cache, write 5, 6, 100 it'll write 1,2,3,4,5,6, 40, 41, report flush cache complete. Obviously I can implement full barrier semantics in the driver if need be but that would cost performance hence the question. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bug in md write barrier support? 2004-09-08 22:21 ` Alan Cox @ 2004-09-09 8:06 ` Jens Axboe 2004-09-09 8:22 ` Arjan van de Ven 2004-09-12 17:13 ` Rogier Wolff 1 sibling, 1 reply; 14+ messages in thread From: Jens Axboe @ 2004-09-09 8:06 UTC (permalink / raw) To: Alan Cox; +Cc: Neil Brown, Christoph Hellwig, Linux Kernel Mailing List On Wed, Sep 08 2004, Alan Cox wrote: > On Mer, 2004-09-08 at 16:46, Jens Axboe wrote: > > That's a worry if it really does that - does it, or are you just > > speculating about possible problems? > > I2O defines cache flush very losely. It flushes the cache and returns > when the cache has been flushed. From playing with the controllers I > have it seems some at least merge further queued writes into the output > stream. Thus if I issue > > write 1, 2, 3, 4 , 40, 41, flush cache, write 5, 6, 100 > > it'll write 1,2,3,4,5,6, 40, 41, report flush cache complete. > > Obviously I can implement full barrier semantics in the driver if need > be but that would cost performance hence the question. Precisely, it's always possible to just drop queueing depth to zero at that point. If I2O really does reorder around the cache flush (this seems broken...), then you probably should. -- Jens Axboe ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bug in md write barrier support? 2004-09-09 8:06 ` Jens Axboe @ 2004-09-09 8:22 ` Arjan van de Ven 2004-09-09 8:29 ` Jens Axboe 0 siblings, 1 reply; 14+ messages in thread From: Arjan van de Ven @ 2004-09-09 8:22 UTC (permalink / raw) To: Jens Axboe Cc: Alan Cox, Neil Brown, Christoph Hellwig, Linux Kernel Mailing List [-- Attachment #1: Type: text/plain, Size: 407 bytes --] > Precisely, it's always possible to just drop queueing depth to zero at > that point. If I2O really does reorder around the cache flush (this > seems broken...), why does this seem broken? semantics of "cache flush guarantees that all io submitted prior to it hits the spindle" are quite sane imo; no guarantee of later submitted IO.. compare the unix "sync" command; same level of semantics. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bug in md write barrier support? 2004-09-09 8:22 ` Arjan van de Ven @ 2004-09-09 8:29 ` Jens Axboe 2004-09-09 12:51 ` Alan Cox 0 siblings, 1 reply; 14+ messages in thread From: Jens Axboe @ 2004-09-09 8:29 UTC (permalink / raw) To: Arjan van de Ven Cc: Alan Cox, Neil Brown, Christoph Hellwig, Linux Kernel Mailing List On Thu, Sep 09 2004, Arjan van de Ven wrote: > > > Precisely, it's always possible to just drop queueing depth to zero at > > that point. If I2O really does reorder around the cache flush (this > > seems broken...), > > why does this seem broken? semantics of "cache flush guarantees that all > io submitted prior to it hits the spindle" are quite sane imo; no > guarantee of later submitted IO.. compare the unix "sync" command; same > level of semantics. Depends on your angle, I think it breaks the principle of least surprise. -- Jens Axboe ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bug in md write barrier support? 2004-09-09 8:29 ` Jens Axboe @ 2004-09-09 12:51 ` Alan Cox 2004-09-09 14:34 ` Jens Axboe 0 siblings, 1 reply; 14+ messages in thread From: Alan Cox @ 2004-09-09 12:51 UTC (permalink / raw) To: Jens Axboe Cc: Arjan van de Ven, Neil Brown, Christoph Hellwig, Linux Kernel Mailing List On Iau, 2004-09-09 at 09:29, Jens Axboe wrote: > > why does this seem broken? semantics of "cache flush guarantees that all > > io submitted prior to it hits the spindle" are quite sane imo; no > > guarantee of later submitted IO.. compare the unix "sync" command; same > > level of semantics. > > Depends on your angle, I think it breaks the principle of least > surprise. As far as I can ascertain raid controllers in general follow this set of semantics. Its less of an issue for many of them with battery backup obviously. It also makes a lot of sense at the hardware level for performance especially when dealing with raid. Alan ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bug in md write barrier support? 2004-09-09 12:51 ` Alan Cox @ 2004-09-09 14:34 ` Jens Axboe 0 siblings, 0 replies; 14+ messages in thread From: Jens Axboe @ 2004-09-09 14:34 UTC (permalink / raw) To: Alan Cox Cc: Arjan van de Ven, Neil Brown, Christoph Hellwig, Linux Kernel Mailing List On Thu, Sep 09 2004, Alan Cox wrote: > On Iau, 2004-09-09 at 09:29, Jens Axboe wrote: > > > why does this seem broken? semantics of "cache flush guarantees that all > > > io submitted prior to it hits the spindle" are quite sane imo; no > > > guarantee of later submitted IO.. compare the unix "sync" command; same > > > level of semantics. > > > > Depends on your angle, I think it breaks the principle of least > > surprise. > > As far as I can ascertain raid controllers in general follow this set of > semantics. Its less of an issue for many of them with battery backup > obviously. > > It also makes a lot of sense at the hardware level for performance > especially when dealing with raid. Yes. As long as the required semantics aren't explicitly guaranteed in the specification, we should not rely on it. -- Jens Axboe ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bug in md write barrier support? 2004-09-08 22:21 ` Alan Cox 2004-09-09 8:06 ` Jens Axboe @ 2004-09-12 17:13 ` Rogier Wolff 1 sibling, 0 replies; 14+ messages in thread From: Rogier Wolff @ 2004-09-12 17:13 UTC (permalink / raw) To: Alan Cox Cc: Jens Axboe, Neil Brown, Christoph Hellwig, Linux Kernel Mailing List On Wed, Sep 08, 2004 at 11:21:39PM +0100, Alan Cox wrote: > I2O defines cache flush very losely. It flushes the cache and returns [...] > write 1, 2, 3, 4 , 40, 41, flush cache, write 5, 6, 100 > it'll write 1,2,3,4,5,6, 40, 41, report flush cache complete. which, if 5,6 are the metadata updates beloging to logfile writes 40,41 and the system powers down between 5 and 41 spells trouble. Roger. -- ** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 ** *-- BitWizard writes Linux device drivers for any device you may have! --* **** "Linux is like a wigwam - no windows, no gates, apache inside!" **** ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2004-09-12 17:13 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-09-03 17:24 bug in md write barrier support? Christoph Hellwig 2004-09-04 0:56 ` Neil Brown 2004-09-04 8:21 ` Jens Axboe 2004-09-06 1:36 ` Neil Brown 2004-09-08 9:23 ` Jens Axboe 2004-09-08 13:35 ` Alan Cox 2004-09-08 15:46 ` Jens Axboe 2004-09-08 22:21 ` Alan Cox 2004-09-09 8:06 ` Jens Axboe 2004-09-09 8:22 ` Arjan van de Ven 2004-09-09 8:29 ` Jens Axboe 2004-09-09 12:51 ` Alan Cox 2004-09-09 14:34 ` Jens Axboe 2004-09-12 17:13 ` Rogier Wolff
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox