* [linux-lvm] fsync() and LVM @ 2009-03-13 17:46 Marco Colombo 2009-03-13 20:08 ` Stuart D. Gathman 0 siblings, 1 reply; 39+ messages in thread From: Marco Colombo @ 2009-03-13 17:46 UTC (permalink / raw) To: LVM general discussion and development Hi, I'm a long time user of both PostgreSQL and LVM. So far I've been quite happy with both. But a recent thread on the PostgreSQL list made me unconfortable. What is this thing they're referring to, fsync()'s being ignored? Makes me feel like I'm running on thin ice, without even knowing. Before I start phasing out LVM from all my PostgreSQL installations (as they suggest), I'd like to hear some kind of confirmation. This is quite scary. http://archives.postgresql.org/pgsql-general/2009-03/msg00204.php In my understanding: fsync(): force data from OS memory to disk (ending up in the disk cache) write barrier: force data from disk cache to disk platters If you disable write-back cache on the disks, you no longer need write barriers. But apparently they claim LVM being unsafe even with disks caches in write-thru mode, which surprises me a lot. .TM. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [linux-lvm] fsync() and LVM 2009-03-13 17:46 [linux-lvm] fsync() and LVM Marco Colombo @ 2009-03-13 20:08 ` Stuart D. Gathman 2009-03-13 20:29 ` Ben Chobot 2009-03-13 20:38 ` Alasdair G Kergon 0 siblings, 2 replies; 39+ messages in thread From: Stuart D. Gathman @ 2009-03-13 20:08 UTC (permalink / raw) To: LVM general discussion and development On Fri, 13 Mar 2009, Marco Colombo wrote: > Hi, I'm a long time user of both PostgreSQL and LVM. So far I've been quite > happy with both. But a recent thread on the PostgreSQL list made me > unconfortable. What is this thing they're referring to, fsync()'s being > ignored? Makes me feel like I'm running on thin ice, without even > knowing. Before I start phasing out LVM from all my PostgreSQL installations > (as they suggest), I'd like to hear some kind of confirmation. > http://archives.postgresql.org/pgsql-general/2009-03/msg00204.php The discussion doesn't make a lot of sense. fsync() is a filesystem call - it can't possibly be handled (or ignored) at a lower level because the lowel level doesn't know which blocks belong to the file. I *can* imagine that perhaps the raw block writes used by the filesystem code might be ignored - or improperly cached. Clearly, they are not ignored (filesystems do get updated) - so if there is any substance to the charge, it must be that LVM reorders writes somehow. Caching doesn't really break anything - it is the *reordering* of writes that could be a problem. A "write barrier" says "finish these writes before you start any more, but otherwise reorder how you like". I did some experiments with iostat, and I am convinced that LVM does not itself do any reordering of writes. Here is my theory as to what is really going on: LVM is not really "ignoring" fsync(), because it would never see it. However, in the presence of hardware writeback caching in disk drives, fsync() would need to tell the hardware to "finish all these writes before you start any more" (i.e. - a write barrier) for fsync() to be effective. I suspect that LVM simply fails to pass the *write barrier* through to underlying layers (i.e. ignores the writebarrier call). Thus, you should be fine if you simply turn off writeback caching in your disk drives. If you could guarrantee that disk drives would be powered long enough after the main system stops - even a drive write back cache would not be much of a risk. -- Stuart D. Gathman <stuart@bmsi.com> Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flammis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [linux-lvm] fsync() and LVM 2009-03-13 20:08 ` Stuart D. Gathman @ 2009-03-13 20:29 ` Ben Chobot 2009-03-13 20:38 ` Alasdair G Kergon 1 sibling, 0 replies; 39+ messages in thread From: Ben Chobot @ 2009-03-13 20:29 UTC (permalink / raw) To: LVM general discussion and development On Fri, 13 Mar 2009, Stuart D. Gathman wrote: > I suspect that LVM simply fails to pass the *write barrier* through > to underlying layers (i.e. ignores the writebarrier call). Thus, you should be > fine if you simply turn off writeback caching in your disk drives. If you > could guarrantee that disk drives would be powered long enough after the main > system stops - even a drive write back cache would not be much of a risk. The big question in my mind is which software layers don't get to see the write barrier. If any of them can reorder writes, that could (however unlikely) lead to data corruption. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [linux-lvm] fsync() and LVM 2009-03-13 20:08 ` Stuart D. Gathman 2009-03-13 20:29 ` Ben Chobot @ 2009-03-13 20:38 ` Alasdair G Kergon 2009-03-14 3:16 ` Marco Colombo 2009-03-14 9:07 ` Dietmar Maurer 1 sibling, 2 replies; 39+ messages in thread From: Alasdair G Kergon @ 2009-03-13 20:38 UTC (permalink / raw) To: LVM general discussion and development Let's try to clear up the confusion. Kernel device-mapper (which lvm uses) does not support write barriers except in very restricted circumstances (when only one device is involved and the mapping is trivial). If dm receives a write barrier which is not supported it notifies the caller (typically a filesystem) so appropriate action can be taken if it wishes. Several kernels releases ago, the implementation of the 'flush device' operation in the block layer was changed from a simple function call that dm supported to a mechanism involving barriers that is trickier for dm to support. Previously 'flush' could not fail and so callers do not generally have strategies to handle such a situation. The latest of several attempts to support barriers is contained in patches here: http://patchwork.kernel.org/project/dm-devel/list/?q=barriers Please review and test if you are interested! Alasdair -- agk@redhat.com ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [linux-lvm] fsync() and LVM 2009-03-13 20:38 ` Alasdair G Kergon @ 2009-03-14 3:16 ` Marco Colombo 2009-03-14 9:07 ` Dietmar Maurer 1 sibling, 0 replies; 39+ messages in thread From: Marco Colombo @ 2009-03-14 3:16 UTC (permalink / raw) To: LVM general discussion and development Alasdair G Kergon wrote: > Several kernels releases ago, the implementation of the 'flush device' > operation in the block layer was changed from a simple function call > that dm supported to a mechanism involving barriers that is trickier for > dm to support. Previously 'flush' could not fail and so callers do not > generally have strategies to handle such a situation. The 'caller' here would be fsync() in the FS. What strategies are available to handle a failing 'flush'? It there anything that can be done at application level (userland)? More than anything, does LVM (or device mapper) really reorder writes? Is it safe with disk caches in write-thru mode? (hdparm -W0) .TM. ^ permalink raw reply [flat|nested] 39+ messages in thread
* RE: [linux-lvm] fsync() and LVM 2009-03-13 20:38 ` Alasdair G Kergon 2009-03-14 3:16 ` Marco Colombo @ 2009-03-14 9:07 ` Dietmar Maurer 2009-03-14 14:31 ` Stuart D. Gathman 1 sibling, 1 reply; 39+ messages in thread From: Dietmar Maurer @ 2009-03-14 9:07 UTC (permalink / raw) To: LVM general discussion and development > Let's try to clear up the confusion. > > Kernel device-mapper (which lvm uses) does not support write barriers > except in very restricted circumstances (when only one device is > involved and the mapping is trivial). If dm receives a write barrier > which is not supported it notifies the caller (typically a filesystem) > so appropriate action can be taken if it wishes. Does that mean I should never use more than one device if I have applications depending on fsync (databases)? - Dietmar ^ permalink raw reply [flat|nested] 39+ messages in thread
* RE: [linux-lvm] fsync() and LVM 2009-03-14 9:07 ` Dietmar Maurer @ 2009-03-14 14:31 ` Stuart D. Gathman 2009-03-15 0:51 ` Marco Colombo 2009-03-15 8:51 ` Dietmar Maurer 0 siblings, 2 replies; 39+ messages in thread From: Stuart D. Gathman @ 2009-03-14 14:31 UTC (permalink / raw) To: LVM general discussion and development On Sat, 14 Mar 2009, Dietmar Maurer wrote: > > Let's try to clear up the confusion. > > > > Kernel device-mapper (which lvm uses) does not support write barriers > > except in very restricted circumstances (when only one device is > > involved and the mapping is trivial). If dm receives a write barrier > > which is not supported it notifies the caller (typically a filesystem) > > so appropriate action can be taken if it wishes. > > Does that mean I should never use more than one device if I have > applications depending on fsync (databases)? It just means that write barriers won't get passed to the device. This is only a problem if the devices have write caches. Note that with multiple devices, even a FIFO write cache could cause reordering between devices (one device could finish faster than another). -- Stuart D. Gathman <stuart@bmsi.com> Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flammis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [linux-lvm] fsync() and LVM 2009-03-14 14:31 ` Stuart D. Gathman @ 2009-03-15 0:51 ` Marco Colombo 2009-03-16 11:02 ` Charles Marcus 2009-03-16 17:17 ` Stuart D. Gathman 2009-03-15 8:51 ` Dietmar Maurer 1 sibling, 2 replies; 39+ messages in thread From: Marco Colombo @ 2009-03-15 0:51 UTC (permalink / raw) To: LVM general discussion and development Stuart D. Gathman wrote: > On Sat, 14 Mar 2009, Dietmar Maurer wrote: > It just means that write barriers won't get passed to the device. > This is only a problem if the devices have write caches. Note > that with multiple devices, even a FIFO write cache could cause > reordering between devices (one device could finish faster than another). No, it's more than that. PostgreSQL gurus say LVM doesn't honor fsync(), that data doesn't even get to the controller, and it doesn't matter if the disks have write caches enabled or not. Or if they have battery backed caches. Please read the thread I linked. If what they say it's true, you can't use LVM for anything that needs fsync(), including mail queues (sendmail), mail storage (imapd), as such. So I'd really like to know. .TM. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [linux-lvm] fsync() and LVM 2009-03-15 0:51 ` Marco Colombo @ 2009-03-16 11:02 ` Charles Marcus 2009-03-16 11:05 ` Martin Schröder 2009-03-16 14:36 ` Marco Colombo 2009-03-16 17:17 ` Stuart D. Gathman 1 sibling, 2 replies; 39+ messages in thread From: Charles Marcus @ 2009-03-16 11:02 UTC (permalink / raw) To: LVM general discussion and development On 3/14/2009 8:51 PM, Marco Colombo wrote: > Stuart D. Gathman wrote: >> On Sat, 14 Mar 2009, Dietmar Maurer wrote: >> It just means that write barriers won't get passed to the device. >> This is only a problem if the devices have write caches. Note >> that with multiple devices, even a FIFO write cache could cause >> reordering between devices (one device could finish faster than another). > No, it's more than that. PostgreSQL gurus say LVM doesn't honor fsync(), > that data doesn't even get to the controller, and it doesn't matter > if the disks have write caches enabled or not. Or if they have battery backed > caches. Please read the thread I linked. If what they say it's true, > you can't use LVM for anything that needs fsync(), including mail queues > (sendmail), mail storage (imapd), as such. So I'd really like to know. Seeing as my /var (with both postfix & courier-imap using it for mail storage) has been on lvm for almost 4 years, that would be news to me... ;) -- Best regards, Charles ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [linux-lvm] fsync() and LVM 2009-03-16 11:02 ` Charles Marcus @ 2009-03-16 11:05 ` Martin Schröder 2009-03-16 11:18 ` Charles Marcus 2009-03-16 14:36 ` Marco Colombo 1 sibling, 1 reply; 39+ messages in thread From: Martin Schröder @ 2009-03-16 11:05 UTC (permalink / raw) To: LVM general discussion and development 2009/3/16, Charles Marcus <CMarcus@media-brokers.com>: > Seeing as my /var (with both postfix & courier-imap using it for mail > storage) has been on lvm for almost 4 years, that would be news to me... And how often has the computer crashed needing an fsck in those years? It's most likely no problem if the fs is always unmounted cleanly. Best Martin ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [linux-lvm] fsync() and LVM 2009-03-16 11:05 ` Martin Schröder @ 2009-03-16 11:18 ` Charles Marcus 2009-03-16 11:25 ` Dietmar Maurer 0 siblings, 1 reply; 39+ messages in thread From: Charles Marcus @ 2009-03-16 11:18 UTC (permalink / raw) To: LVM general discussion and development On 3/16/2009, Martin Schr�der (martin@oneiros.de) wrote: > And how often has the computer crashed needing an fsck in those years? > It's most likely no problem if the fs is always unmounted cleanly. There have been probably 4 unclean shutdowns (due to extended power outages) in these 4 years, 2 of which required an extended fsck... Running reiserfs too... Zero problems to date (knock on wood)... -- Best regards, Charles ^ permalink raw reply [flat|nested] 39+ messages in thread
* RE: [linux-lvm] fsync() and LVM 2009-03-16 11:18 ` Charles Marcus @ 2009-03-16 11:25 ` Dietmar Maurer 0 siblings, 0 replies; 39+ messages in thread From: Dietmar Maurer @ 2009-03-16 11:25 UTC (permalink / raw) To: LVM general discussion and development > On 3/16/2009, Martin Schr�der (martin@oneiros.de) wrote: > > And how often has the computer crashed needing an fsck in those > years? > > It's most likely no problem if the fs is always unmounted cleanly. > > There have been probably 4 unclean shutdowns (due to extended power > outages) in these 4 years, 2 of which required an extended fsck... > > Running reiserfs too... > > Zero problems to date (knock on wood)... The question is if fsync is implemented correctly or not? - Dietmar ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [linux-lvm] fsync() and LVM 2009-03-16 11:02 ` Charles Marcus 2009-03-16 11:05 ` Martin Schröder @ 2009-03-16 14:36 ` Marco Colombo 2009-03-16 17:13 ` Stuart D. Gathman 1 sibling, 1 reply; 39+ messages in thread From: Marco Colombo @ 2009-03-16 14:36 UTC (permalink / raw) To: LVM general discussion and development Charles Marcus wrote: > On 3/14/2009 8:51 PM, Marco Colombo wrote: >> Stuart D. Gathman wrote: >>> On Sat, 14 Mar 2009, Dietmar Maurer wrote: >>> It just means that write barriers won't get passed to the device. >>> This is only a problem if the devices have write caches. Note >>> that with multiple devices, even a FIFO write cache could cause >>> reordering between devices (one device could finish faster than another). > >> No, it's more than that. PostgreSQL gurus say LVM doesn't honor fsync(), >> that data doesn't even get to the controller, and it doesn't matter >> if the disks have write caches enabled or not. Or if they have battery backed >> caches. Please read the thread I linked. If what they say it's true, >> you can't use LVM for anything that needs fsync(), including mail queues >> (sendmail), mail storage (imapd), as such. So I'd really like to know. > > Seeing as my /var (with both postfix & courier-imap using it for mail > storage) has been on lvm for almost 4 years, that would be news to me... > > ;) > Believe me or not, they both depend on fsync(). Anyway, even if you lost a message, how do you expect to know? If you have any user base large enough, you're used to 'missing' messages (99% of the user-deleted or user-never-sent kind). A truly lost one may have gone missed in the noise. A lying fsync() doesn't blow all your mail repository up, just you may loose one/two messages on a crash. Or a transaction, speaking of databases. If that's the case, I would like to know, that's all. .TM. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [linux-lvm] fsync() and LVM 2009-03-16 14:36 ` Marco Colombo @ 2009-03-16 17:13 ` Stuart D. Gathman 0 siblings, 0 replies; 39+ messages in thread From: Stuart D. Gathman @ 2009-03-16 17:13 UTC (permalink / raw) To: LVM general discussion and development On Mon, 16 Mar 2009, Marco Colombo wrote: > >> No, it's more than that. PostgreSQL gurus say LVM doesn't honor fsync(), > >> that data doesn't even get to the controller, and it doesn't matter If that was the case, then just *attempting* to call fsync would corrupt your filesystem/database- "dirty" blocks would not actually get written, but still get marked "clean". Clearly, LVM does not interfere with writing to the disk. It is only write barriers (waiting for the writes to actually finish) that don't get passed through (in all but the most simple cases). It simply returns you to the old days when the man page for sync() said "this queues dirty blocks for writing but does not wait for them to finish" and shutdown scripts called sync() multiple times with sleeps in between. > A lying fsync() doesn't blow all your mail repository up, just you may > loose one/two messages on a crash. Or a transaction, speaking of databases. > If that's the case, I would like to know, that's all. Since the fsync() returns "fail" when LVM can't map it to multiple devices, it isn't exactly "lying". And one possible response to a failure might be to wait a bit. According to the redhat guy, this problem came up when the simple block device "flush" call was replaced with the more complex write barrier. LVM had no problem passing through a simple block device flush. (Why couldn't the simple "flush" call still be available?) I would like to know which kernel version made this change. -- Stuart D. Gathman <stuart@bmsi.com> Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flammis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [linux-lvm] fsync() and LVM 2009-03-15 0:51 ` Marco Colombo 2009-03-16 11:02 ` Charles Marcus @ 2009-03-16 17:17 ` Stuart D. Gathman 2009-03-16 18:50 ` Les Mikesell 2009-03-17 16:00 ` Marco Colombo 1 sibling, 2 replies; 39+ messages in thread From: Stuart D. Gathman @ 2009-03-16 17:17 UTC (permalink / raw) To: LVM general discussion and development On Sun, 15 Mar 2009, Marco Colombo wrote: > Stuart D. Gathman wrote: > > On Sat, 14 Mar 2009, Dietmar Maurer wrote: > > It just means that write barriers won't get passed to the device. > > This is only a problem if the devices have write caches. Note > > that with multiple devices, even a FIFO write cache could cause > > reordering between devices (one device could finish faster than another). > > No, it's more than that. PostgreSQL gurus say LVM doesn't honor fsync(), That is clearly wrong - since fsync() isn't LVM's responsibility. I think they mean that fsync() can't garrantee that any writes are actually on the platter. > that data doesn't even get to the controller, and it doesn't matter > if the disks have write caches enabled or not. Or if they have battery backed > caches. Please read the thread I linked. If what they say it's true, That is clearly wrong. If writes don't work, nothing works. > you can't use LVM for anything that needs fsync(), including mail queues > (sendmail), mail storage (imapd), as such. So I'd really like to know. fsync() is a file system call that writes dirty buffers, and then waits for the physical writes to complete. It is only the waiting part that is broken. -- Stuart D. Gathman <stuart@bmsi.com> Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flammis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [linux-lvm] fsync() and LVM 2009-03-16 17:17 ` Stuart D. Gathman @ 2009-03-16 18:50 ` Les Mikesell 2009-03-16 19:36 ` Greg Freemyer 2009-03-17 16:00 ` Marco Colombo 1 sibling, 1 reply; 39+ messages in thread From: Les Mikesell @ 2009-03-16 18:50 UTC (permalink / raw) To: LVM general discussion and development Stuart D. Gathman wrote: > >> No, it's more than that. PostgreSQL gurus say LVM doesn't honor fsync(), > > That is clearly wrong - since fsync() isn't LVM's responsibility. > I think they mean that fsync() can't garrantee that any writes are > actually on the platter. > >> that data doesn't even get to the controller, and it doesn't matter >> if the disks have write caches enabled or not. Or if they have battery backed >> caches. Please read the thread I linked. If what they say it's true, > > That is clearly wrong. If writes don't work, nothing works. > >> you can't use LVM for anything that needs fsync(), including mail queues >> (sendmail), mail storage (imapd), as such. So I'd really like to know. > > fsync() is a file system call that writes dirty buffers, and then waits > for the physical writes to complete. It is only the waiting part that > is broken. It's a yes or no question... Fsync() either guarantees that the write is committed to physical media so the application can continue knowing that it's own transactional expectations are met (i.e. you can crash and recover that piece of data), or it is broken. If it doesn't wait for completion, it can't possibly report the correct status. -- Les Mikesell lesmikesell@gmail.com ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [linux-lvm] fsync() and LVM 2009-03-16 18:50 ` Les Mikesell @ 2009-03-16 19:36 ` Greg Freemyer 2009-03-16 19:55 ` [linux-lvm] liblvm status question ben scott 2009-03-16 20:28 ` [linux-lvm] fsync() and LVM Les Mikesell 0 siblings, 2 replies; 39+ messages in thread From: Greg Freemyer @ 2009-03-16 19:36 UTC (permalink / raw) To: LVM general discussion and development On Mon, Mar 16, 2009 at 2:50 PM, Les Mikesell <lesmikesell@gmail.com> wrote: > Stuart D. Gathman wrote: >> >>> No, it's more than that. PostgreSQL gurus say LVM doesn't honor fsync(), >> >> That is clearly wrong - since fsync() isn't LVM's responsibility. >> I think they mean that fsync() can't garrantee that any writes are >> actually on the platter. >> >>> that data doesn't even get to the controller, and it doesn't matter >>> if the disks have write caches enabled or not. Or if they have battery >>> backed >>> caches. Please read the thread I linked. If what they say it's true, >> >> That is clearly wrong. �If writes don't work, nothing works. >> >>> you can't use LVM for anything that needs fsync(), including mail queues >>> (sendmail), mail storage (imapd), as such. So I'd really like to know. >> >> fsync() is a file system call that writes dirty buffers, and then waits >> for the physical writes to complete. �It is only the waiting part that >> is broken. > > It's a yes or no question... �Fsync() either guarantees that the write is > committed to physical media so the application can continue knowing that > it's own transactional expectations are met (i.e. you can crash and recover > that piece of data), or it is broken. �If it doesn't wait for completion, it > can't possibly report the correct status. > This discussion seems a bit bizarre to me. Many apps require data get to stable memory in a well defined way. Barriers is certainly one way to do that, but I don't think barriers are supported by LVM, mdraid, or drbd. Those are some very significant subsystems. I have to believe filesystems have another way to implement fsync if barriers are not supported in the stack of block susbsystems. Maybe this discussion needs to move to a filesystem list, since it is the filesystem that is responsible for making fsync() work even in the absence of barriers. Greg -- Greg Freemyer Head of EDD Tape Extraction and Processing team Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer First 99 Days Litigation White Paper - http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com ^ permalink raw reply [flat|nested] 39+ messages in thread
* [linux-lvm] liblvm status question 2009-03-16 19:36 ` Greg Freemyer @ 2009-03-16 19:55 ` ben scott 2009-03-16 20:58 ` Greg Freemyer 2009-03-16 20:28 ` [linux-lvm] fsync() and LVM Les Mikesell 1 sibling, 1 reply; 39+ messages in thread From: ben scott @ 2009-03-16 19:55 UTC (permalink / raw) To: LVM general discussion and development Is the liblvm project at a state where vgs, lvs or pvs functionality is mostly working? I am writing a program for working with logical volumes and it would be very helpful if I could start integrating liblvm now even if it is still buggy at the moment. Also, where can I find the files or cvs? Thank you ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [linux-lvm] liblvm status question 2009-03-16 19:55 ` [linux-lvm] liblvm status question ben scott @ 2009-03-16 20:58 ` Greg Freemyer 2009-03-17 10:38 ` Bryn M. Reeves 0 siblings, 1 reply; 39+ messages in thread From: Greg Freemyer @ 2009-03-16 20:58 UTC (permalink / raw) To: LVM general discussion and development On Mon, Mar 16, 2009 at 3:55 PM, ben scott <benscott@nwlink.com> wrote: > Is the liblvm project at a state where vgs, lvs or pvs functionality is mostly > working? I am writing a program for working with logical volumes and it would > be very helpful if I could start integrating liblvm now even if it is still > buggy at the moment. Also, where can I �find the files or cvs? > > Thank you I think you mean libdevmapper don't you? Pretty sure libdevmapper is used by the core LVM 2.0 tools. Unfortunately, I'm not aware of any documentation about what the API is. I guess you have to read the source. Greg -- Greg Freemyer Head of EDD Tape Extraction and Processing team Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer First 99 Days Litigation White Paper - http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [linux-lvm] liblvm status question 2009-03-16 20:58 ` Greg Freemyer @ 2009-03-17 10:38 ` Bryn M. Reeves 2009-03-17 18:42 ` ben scott 2009-03-17 20:52 ` Greg Freemyer 0 siblings, 2 replies; 39+ messages in thread From: Bryn M. Reeves @ 2009-03-17 10:38 UTC (permalink / raw) To: LVM general discussion and development On Mon, 2009-03-16 at 16:58 -0400, Greg Freemyer wrote: > On Mon, Mar 16, 2009 at 3:55 PM, ben scott <benscott@nwlink.com> wrote: > > Is the liblvm project at a state where vgs, lvs or pvs functionality is mostly > > working? I am writing a program for working with logical volumes and it would > > be very helpful if I could start integrating liblvm now even if it is still > > buggy at the moment. Also, where can I find the files or cvs? > > > > Thank you > > I think you mean libdevmapper don't you? No, he means liblvm: http://fedoraproject.org/wiki/Features/liblvm http://fedoraproject.org/wiki/LVM/liblvm Patches are just starting to be merged but it's still a work-in-progress: http://www.redhat.com/archives/lvm-devel/2009-March/msg00008.html Follow the lvm-devel mailing list to keep track of what's going on. Regards, Bryn. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [linux-lvm] liblvm status question 2009-03-17 10:38 ` Bryn M. Reeves @ 2009-03-17 18:42 ` ben scott 2009-03-17 20:52 ` Greg Freemyer 1 sibling, 0 replies; 39+ messages in thread From: ben scott @ 2009-03-17 18:42 UTC (permalink / raw) To: LVM general discussion and development On Tuesday 17 March 2009 3:38:24 am Bryn M. Reeves wrote: > On Mon, 2009-03-16 at 16:58 -0400, Greg Freemyer wrote: > > On Mon, Mar 16, 2009 at 3:55 PM, ben scott <benscott@nwlink.com> wrote: > > > Is the liblvm project at a state where vgs, lvs or pvs functionality is > > > mostly working? I am writing a program for working with logical volumes > > > and it would be very helpful if I could start integrating liblvm now > > > even if it is still buggy at the moment. Also, where can I find the > > > files or cvs? > > > > > > Thank you > > > > I think you mean libdevmapper don't you? > > No, he means liblvm: > > http://fedoraproject.org/wiki/Features/liblvm > http://fedoraproject.org/wiki/LVM/liblvm > > Patches are just starting to be merged but it's still a > work-in-progress: > > http://www.redhat.com/archives/lvm-devel/2009-March/msg00008.html > > Follow the lvm-devel mailing list to keep track of what's going on. > > Regards, > Bryn. Thank you. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [linux-lvm] liblvm status question 2009-03-17 10:38 ` Bryn M. Reeves 2009-03-17 18:42 ` ben scott @ 2009-03-17 20:52 ` Greg Freemyer 1 sibling, 0 replies; 39+ messages in thread From: Greg Freemyer @ 2009-03-17 20:52 UTC (permalink / raw) To: LVM general discussion and development On Tue, Mar 17, 2009 at 6:38 AM, Bryn M. Reeves <bmr@redhat.com> wrote: > On Mon, 2009-03-16 at 16:58 -0400, Greg Freemyer wrote: >> On Mon, Mar 16, 2009 at 3:55 PM, ben scott <benscott@nwlink.com> wrote: >> > Is the liblvm project at a state where vgs, lvs or pvs functionality is mostly >> > working? I am writing a program for working with logical volumes and it would >> > be very helpful if I could start integrating liblvm now even if it is still >> > buggy at the moment. Also, where can I �find the files or cvs? >> > >> > Thank you >> >> I think you mean libdevmapper don't you? > > No, he means liblvm: > > http://fedoraproject.org/wiki/Features/liblvm > http://fedoraproject.org/wiki/LVM/liblvm > > Patches are just starting to be merged but it's still a > work-in-progress: > > http://www.redhat.com/archives/lvm-devel/2009-March/msg00008.html > Very cool. I just recommended libdevmapper to a project team doing some work. liblvm looks like a much better fit for them. Greg -- Greg Freemyer Head of EDD Tape Extraction and Processing team Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer First 99 Days Litigation White Paper - http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [linux-lvm] fsync() and LVM 2009-03-16 19:36 ` Greg Freemyer 2009-03-16 19:55 ` [linux-lvm] liblvm status question ben scott @ 2009-03-16 20:28 ` Les Mikesell 2009-03-16 20:54 ` Greg Freemyer 1 sibling, 1 reply; 39+ messages in thread From: Les Mikesell @ 2009-03-16 20:28 UTC (permalink / raw) To: LVM general discussion and development Greg Freemyer wrote: > >>>> you can't use LVM for anything that needs fsync(), including mail queues >>>> (sendmail), mail storage (imapd), as such. So I'd really like to know. >>> fsync() is a file system call that writes dirty buffers, and then waits >>> for the physical writes to complete. It is only the waiting part that >>> is broken. >> It's a yes or no question... Fsync() either guarantees that the write is >> committed to physical media so the application can continue knowing that >> it's own transactional expectations are met (i.e. you can crash and recover >> that piece of data), or it is broken. If it doesn't wait for completion, it >> can't possibly report the correct status. >> > > This discussion seems a bit bizarre to me. You can't avoid a discussion of expected but missing functionality. > Many apps require data get > to stable memory in a well defined way. Barriers is certainly one way > to do that, but I don't think barriers are supported by LVM, mdraid, > or drbd. > > Those are some very significant subsystems. I have to believe > filesystems have another way to implement fsync if barriers are not > supported in the stack of block susbsystems. If you can't get the completion status from the underlying layer, how can a filesystem possibly implement it? > Maybe this discussion needs to move to a filesystem list, since it is > the filesystem that is responsible for making fsync() work even in the > absence of barriers. I though linux ended up doing a sync of the entire outstanding buffered data for a partition with horrible performance, at least on ext3. -- Les Mikesell lesmikesell@gmail.com ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [linux-lvm] fsync() and LVM 2009-03-16 20:28 ` [linux-lvm] fsync() and LVM Les Mikesell @ 2009-03-16 20:54 ` Greg Freemyer 2009-03-16 21:17 ` Les Mikesell 0 siblings, 1 reply; 39+ messages in thread From: Greg Freemyer @ 2009-03-16 20:54 UTC (permalink / raw) To: LVM general discussion and development On Mon, Mar 16, 2009 at 4:28 PM, Les Mikesell <lesmikesell@gmail.com> wrote: > Greg Freemyer wrote: >> >>>>> you can't use LVM for anything that needs fsync(), including mail >>>>> queues >>>>> (sendmail), mail storage (imapd), as such. So I'd really like to know. >>>> >>>> fsync() is a file system call that writes dirty buffers, and then waits >>>> for the physical writes to complete. �It is only the waiting part that >>>> is broken. >>> >>> It's a yes or no question... �Fsync() either guarantees that the write is >>> committed to physical media so the application can continue knowing that >>> it's own transactional expectations are met (i.e. you can crash and >>> recover >>> that piece of data), or it is broken. �If it doesn't wait for completion, >>> it >>> can't possibly report the correct status. >>> >> >> This discussion seems a bit bizarre to me. > > You can't avoid a discussion of expected but missing functionality. > >> Many apps require data get >> to stable memory in a well defined way. �Barriers is certainly one way >> to do that, but I don't think barriers are supported by LVM, mdraid, >> or drbd. >> >> Those are some very significant subsystems. �I have to believe >> filesystems have another way to implement fsync if barriers are not >> supported in the stack of block susbsystems. > > If you can't get the completion status from the underlying layer, how can a > filesystem possibly implement it? Barriers is a specific technology and they were just implemented in linux around 2005 I think. (see documentation/barriers.txt) Surely there was a mechanism in place before that. >> Maybe this discussion needs to move to a filesystem list, since it is >> the filesystem that is responsible for making fsync() work even in the >> absence of barriers. > > I though linux ended up doing a sync of the entire outstanding buffered data > for a partition with horrible performance, at least on ext3. Yes, I understand fsync is horribly slow in ext3 and that may be the reason. Supposedly much better in ext4. Still if a userspace app calls fsync and in turn the filesystem does something really slow due to the lack of barriers, then this conversation should be about the poor performance of fsync() when using lvm (or mdraid, or drdb), not the total lack of fsync() support. > -- > �Les Mikesell > � lesmikesell@gmail.com Greg -- Greg Freemyer Head of EDD Tape Extraction and Processing team Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer First 99 Days Litigation White Paper - http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [linux-lvm] fsync() and LVM 2009-03-16 20:54 ` Greg Freemyer @ 2009-03-16 21:17 ` Les Mikesell 2009-03-16 21:36 ` Greg Freemyer 0 siblings, 1 reply; 39+ messages in thread From: Les Mikesell @ 2009-03-16 21:17 UTC (permalink / raw) To: LVM general discussion and development Greg Freemyer wrote: > >>> Those are some very significant subsystems. I have to believe >>> filesystems have another way to implement fsync if barriers are not >>> supported in the stack of block susbsystems. >> If you can't get the completion status from the underlying layer, how can a >> filesystem possibly implement it? > > Barriers is a specific technology and they were just implemented in > linux around 2005 I think. (see documentation/barriers.txt) > > Surely there was a mechanism in place before that. I'm not sure that's a reasonable assumption. >>> Maybe this discussion needs to move to a filesystem list, since it is >>> the filesystem that is responsible for making fsync() work even in the >>> absence of barriers. >> I though linux ended up doing a sync of the entire outstanding buffered data >> for a partition with horrible performance, at least on ext3. > > Yes, I understand fsync is horribly slow in ext3 and that may be the > reason. Supposedly much better in ext4. Still if a userspace app > calls fsync and in turn the filesystem does something really slow due > to the lack of barriers, then this conversation should be about the > poor performance of fsync() when using lvm (or mdraid, or drdb), not > the total lack of fsync() support. I haven't seen anyone claim yet that there is support for fsync(), which must return the status of the completion of the operation to the application. If it does, then the discussion could turn to performance. -- Les Mikesell lesmikesell@gmail.com ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [linux-lvm] fsync() and LVM 2009-03-16 21:17 ` Les Mikesell @ 2009-03-16 21:36 ` Greg Freemyer 2009-03-16 21:53 ` Les Mikesell 2009-03-16 21:57 ` Allen, Jack 0 siblings, 2 replies; 39+ messages in thread From: Greg Freemyer @ 2009-03-16 21:36 UTC (permalink / raw) To: LVM general discussion and development On Mon, Mar 16, 2009 at 5:17 PM, Les Mikesell <lesmikesell@gmail.com> wrote: > Greg Freemyer wrote: >> >>>> Those are some very significant subsystems. �I have to believe >>>> filesystems have another way to implement fsync if barriers are not >>>> supported in the stack of block susbsystems. >>> >>> If you can't get the completion status from the underlying layer, how can >>> a >>> filesystem possibly implement it? >> >> Barriers is a specific technology and they were just implemented in >> linux around 2005 I think. �(see documentation/barriers.txt) >> >> Surely there was a mechanism in place before that. > > I'm not sure that's a reasonable assumption. > >>>> Maybe this discussion needs to move to a filesystem list, since it is >>>> the filesystem that is responsible for making fsync() work even in the >>>> absence of barriers. >>> >>> I though linux ended up doing a sync of the entire outstanding buffered >>> data >>> for a partition with horrible performance, at least on ext3. >> >> Yes, I understand fsync is horribly slow in ext3 and that may be the >> reason. �Supposedly much better in ext4. �Still if a userspace app >> calls fsync and in turn the filesystem does something really slow due >> to the lack of barriers, then this conversation should be about the >> poor performance of fsync() when using lvm (or mdraid, or drdb), not >> the total lack of fsync() support. > > I haven't seen anyone claim yet that there is support for fsync(), which > must return the status of the completion of the operation to the > application. �If it does, then the discussion could turn to performance. > > -- > �Les Mikesell > � lesmikesell@gmail.com Is your specific interest to ext3? If so, I suggest you post a question there along the lines of: Device Mapper does not support barriers if more than one physical device is in use by the LV. If I'm using ext3 on a LV and I call fsync() from user space, how is fsync() implemented. Or is it not? The ext4 list is <linux-ext4@vger.kernel.org>. I see some ext3 stuff posted there, or it may have its own list. Greg -- Greg Freemyer Head of EDD Tape Extraction and Processing team Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer First 99 Days Litigation White Paper - http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [linux-lvm] fsync() and LVM 2009-03-16 21:36 ` Greg Freemyer @ 2009-03-16 21:53 ` Les Mikesell 2009-03-16 22:51 ` Joshua D. Drake 2009-03-16 21:57 ` Allen, Jack 1 sibling, 1 reply; 39+ messages in thread From: Les Mikesell @ 2009-03-16 21:53 UTC (permalink / raw) To: LVM general discussion and development Greg Freemyer wrote: >>> I haven't seen anyone claim yet that there is support for fsync(), which >> must return the status of the completion of the operation to the >> application. If it does, then the discussion could turn to performance. >> > Is your specific interest to ext3? No, it is whether a useful fsync() is possible over LVM. > If so, I suggest you post a > question there along the lines of: > > Device Mapper does not support barriers if more than one physical > device is in use by the LV. If I'm using ext3 on a LV and I call > fsync() from user space, how is fsync() implemented. Or is it not? The point of fsync() is for an application to know that a write has been safely committed, as for example sendmail would do before acknowledging to the sender that a message has been accepted. The question isn't whether an application can call fsync() but rather whether it's return status is lying, making the underlying storage unsuitable for anything that needs reliability. -- Les Mikesell lesmikesell@gmail.com ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [linux-lvm] fsync() and LVM 2009-03-16 21:53 ` Les Mikesell @ 2009-03-16 22:51 ` Joshua D. Drake 2009-03-17 15:33 ` Joshua D. Drake 0 siblings, 1 reply; 39+ messages in thread From: Joshua D. Drake @ 2009-03-16 22:51 UTC (permalink / raw) To: LVM general discussion and development On Mon, 2009-03-16 at 16:53 -0500, Les Mikesell wrote: > The point of fsync() is for an application to know that a write has been > safely committed, as for example sendmail would do before acknowledging > to the sender that a message has been accepted. The question isn't > whether an application can call fsync() but rather whether it's return > status is lying, making the underlying storage unsuitable for anything > that needs reliability. Right and for databases this is critical. So enlightenment here would be good. Sincerely, Joshua D. Drake > -- PostgreSQL - XMPP: jdrake@jabber.postgresql.org Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997 ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [linux-lvm] fsync() and LVM 2009-03-16 22:51 ` Joshua D. Drake @ 2009-03-17 15:33 ` Joshua D. Drake 2009-03-19 9:20 ` Tim Post 0 siblings, 1 reply; 39+ messages in thread From: Joshua D. Drake @ 2009-03-17 15:33 UTC (permalink / raw) To: LVM general discussion and development On Mon, 2009-03-16 at 15:51 -0700, Joshua D. Drake wrote: > On Mon, 2009-03-16 at 16:53 -0500, Les Mikesell wrote: > > > The point of fsync() is for an application to know that a write has been > > safely committed, as for example sendmail would do before acknowledging > > to the sender that a message has been accepted. The question isn't > > whether an application can call fsync() but rather whether it's return > > status is lying, making the underlying storage unsuitable for anything > > that needs reliability. > > Right and for databases this is critical. So enlightenment here would be > good. Anyone? Joshua D. Drake -- PostgreSQL - XMPP: jdrake@jabber.postgresql.org Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997 ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [linux-lvm] fsync() and LVM 2009-03-17 15:33 ` Joshua D. Drake @ 2009-03-19 9:20 ` Tim Post 0 siblings, 0 replies; 39+ messages in thread From: Tim Post @ 2009-03-19 9:20 UTC (permalink / raw) To: jd, LVM general discussion and development On Tue, 2009-03-17 at 08:33 -0700, Joshua D. Drake wrote: > On Mon, 2009-03-16 at 15:51 -0700, Joshua D. Drake wrote: > > On Mon, 2009-03-16 at 16:53 -0500, Les Mikesell wrote: > > > > > The point of fsync() is for an application to know that a write has been > > > safely committed, as for example sendmail would do before acknowledging > > > to the sender that a message has been accepted. The question isn't > > > whether an application can call fsync() but rather whether it's return > > > status is lying, making the underlying storage unsuitable for anything > > > that needs reliability. > > > > Right and for databases this is critical. So enlightenment here would be > > good. > > Anyone? > > Joshua D. Drake If a logical volume spans physical devices where write caching is enabled, the results of fsync() can not be trusted. This is an issue with device mapper, lvm is one of a few possible customers of DM. Now it gets interesting: Enter virtualization. When you have something like this: fsync -> guest block device -> block tap driver -> CLVM -> iscsi -> storage -> physical disk. Even if device mapper passed along the write barrier, would it be reliable? Is every part of that chain going to pass the same along, and how many opportunities for re-ordering are presented in the above? So, even if its fixed in DM, can fsync() still be trusted? I think, at the least, more testing should be done with various configurations even after a suitable patch to DM is merged. What about PGSQL users using some kind of elastic hosting? Given the craze in 'cloud' technology, its an important question to ask (and research). Cheers, --Tim ^ permalink raw reply [flat|nested] 39+ messages in thread
* RE: [linux-lvm] fsync() and LVM 2009-03-16 21:36 ` Greg Freemyer 2009-03-16 21:53 ` Les Mikesell @ 2009-03-16 21:57 ` Allen, Jack 1 sibling, 0 replies; 39+ messages in thread From: Allen, Jack @ 2009-03-16 21:57 UTC (permalink / raw) To: LVM general discussion and development -----Original Message----- From: linux-lvm-bounces@redhat.com [mailto:linux-lvm-bounces@redhat.com] On Behalf Of Greg Freemyer Sent: Monday, March 16, 2009 5:36 PM To: LVM general discussion and development Subject: Re: [linux-lvm] fsync() and LVM On Mon, Mar 16, 2009 at 5:17 PM, Les Mikesell <lesmikesell@gmail.com> wrote: > Greg Freemyer wrote: >> >>>> Those are some very significant subsystems. �I have to believe >>>> filesystems have another way to implement fsync if barriers are not >>>> supported in the stack of block susbsystems. >>> >>> If you can't get the completion status from the underlying layer, how can >>> a >>> filesystem possibly implement it? >> >> Barriers is a specific technology and they were just implemented in >> linux around 2005 I think. �(see documentation/barriers.txt) >> >> Surely there was a mechanism in place before that. > > I'm not sure that's a reasonable assumption. > >>>> Maybe this discussion needs to move to a filesystem list, since it is >>>> the filesystem that is responsible for making fsync() work even in the >>>> absence of barriers. >>> >>> I though linux ended up doing a sync of the entire outstanding buffered >>> data >>> for a partition with horrible performance, at least on ext3. >> >> Yes, I understand fsync is horribly slow in ext3 and that may be the >> reason. �Supposedly much better in ext4. �Still if a userspace app >> calls fsync and in turn the filesystem does something really slow due >> to the lack of barriers, then this conversation should be about the >> poor performance of fsync() when using lvm (or mdraid, or drdb), not >> the total lack of fsync() support. > > I haven't seen anyone claim yet that there is support for fsync(), which > must return the status of the completion of the operation to the > application. �If it does, then the discussion could turn to performance. > > -- > �Les Mikesell > � lesmikesell@gmail.com Is your specific interest to ext3? If so, I suggest you post a question there along the lines of: Device Mapper does not support barriers if more than one physical device is in use by the LV. If I'm using ext3 on a LV and I call fsync() from user space, how is fsync() implemented. Or is it not? The ext4 list is <linux-ext4@vger.kernel.org>. I see some ext3 stuff posted there, or it may have its own list. Greg -- ======================================== So what happens if there is a database implemented directly on a Logical Volume, not File System involved at all? Should the fsync man page describe what happens when used on each type of File System, Logical Volume, disk partition and /or combination? ----- Thanks: Jack Allen ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [linux-lvm] fsync() and LVM 2009-03-16 17:17 ` Stuart D. Gathman 2009-03-16 18:50 ` Les Mikesell @ 2009-03-17 16:00 ` Marco Colombo 2009-03-17 17:40 ` Stuart D. Gathman 1 sibling, 1 reply; 39+ messages in thread From: Marco Colombo @ 2009-03-17 16:00 UTC (permalink / raw) To: LVM general discussion and development [-- Attachment #1: Type: text/plain, Size: 4543 bytes --] Stuart D. Gathman wrote: > That is clearly wrong - since fsync() isn't LVM's responsibility. > I think they mean that fsync() can't garrantee that any writes are > actually on the platter. Even if the disk cache is in write-thru mode, that is. >> that data doesn't even get to the controller, and it doesn't matter >> if the disks have write caches enabled or not. Or if they have battery backed >> caches. Please read the thread I linked. If what they say it's true, > > That is clearly wrong. If writes don't work, nothing works. It's the flush (= write NOW) supposedly not working, not the write. Writes happen, just later and potentially not in order. You seems to assume that fsync() is the only way to have the data written. That's not clearly the case, most userland processes just issue write(), never fsync(), and data gets written anyway, sooner or later. >> you can't use LVM for anything that needs fsync(), including mail queues >> (sendmail), mail storage (imapd), as such. So I'd really like to know. > > fsync() is a file system call that writes dirty buffers, sure, but it's not the only way to have dirty pages flushed. There's a kernel thread that flushes them every since and then, and there's also memory pressure. So a broken fsync() can go unnoticed, you become aware of it if and only if: 1) you run some application that needs it (most don't even use it); 2) the system crashes (power loss); 3) you are unlucky enough to hit the window of vulnerability. If any of these conditions is not met, you won't be aware of a mulfunctioning fsync(). But I think I understand what you mean: if the API to flush to physical storage is the same (used by fsync(), by pdflush, by the VM system) then you're right, everything is broken. But I've been using LVM for years now, I'm assuming that's not the case. :) > and then waits > for the physical writes to complete. It is only the waiting part that > is broken. Half-broken is broken. And the bigger issue here it's not even the delay. The issue is ordering. For a database, loosing the last transactions is bad enough, loosing transactions in the middle of the timeline is even worse. For the mail subsystems, there's almost no ordering requirement, still loosing messages is no good. --------------- Ehm, I've decided to write a small test program. My system is a Fedora 7, so nowhere recent. My setup: /home is a LV, belonging to VG 'vg_data', whose only PV is /dev/md6. /dev/md6 is a RAID1 md device, whose members are /dev/sda10 and /dev/sdb10. /dev/sda and /dev/sdb are both Seagate ST3320620AS SATA disks. The filesystem is EXT3, mounted with noatime,data=ordered. The attached program writes the same block on a file N times (looping on lseek/write. Depending on how it's compiled, it issues a fdatasync() after each write. Here are the results, for 32MB of data written: $ time ./test_nosync real 0m0.056s user 0m0.004s sys 0m0.052s clearly, not disk activity here. $ time ./test_sync real 0m2.070s user 0m0.002s sys 0m0.152s Now the same after hdparm -W0 /dev/sda; hdparm -W0 /dev/sdb: $ time ./test_sync real 1m16.431s user 0m0.004s sys 0m0.273s These are 4096 "transactions" of size 8192, w/o the overhaed of allocating new blocks (it writes to the same block over and over). The first test is meaningless (they are never really committed). The second test, it's about 2000 transactions per second. Too many. In the third test, I got only about 50 transactions per second, which makes a lot of sense. It seems to me that in my setup, disabling the caches on the disks does bring data to the platters, and that noone is "lying" about fsync. Now I'm _really_ confused. (the following isn't meaningful for the discussion) For the curious of you (I was) I commented out the lseek(). For the _nosync version it's the same (1/2 a second). For the _sync version, with -W1 I get: $ time ./test_sync real 0m48.816s user 0m0.002s sys 0m0.483s and with -W0: $ time ./test_sync real 3m6.674s user 0m0.006s sys 0m0.526s Since all the test were done deleting the file each time, I think what happens here is that the file is increasing in size, so fdatasync() each time triggers a write of the inode. It's two writes per loop. So I tried keeping the file around, having my test program write on preallocated blocks. With -W1: $ time ./test_sync real 0m11.253s user 0m0.001s sys 0m0.244s with -W0: $ time ./test_sync real 0m46.353s user 0m0.005s sys 0m0.249s .TM. [-- Attachment #2: test.c --] [-- Type: text/x-csrc, Size: 807 bytes --] /* * compile with -DDO_FSYNC=1 and then with -DDO_FSYNC=0 */ #include <sys/types.h> #include <fcntl.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #if !defined(DO_FSYNC) # error "You must define DO_FSYNC" #endif #define MYBUFSIZ BUFSIZ #define BYTES_TO_WRITE (32*1024*1024) /* 32MB */ int main(int argc, char *argv[]) { int fd, rc, i; char buf[MYBUFSIZ] = { '\0', }; fd = open("testfile", O_WRONLY|O_CREAT, 0600); if (fd < 0) { perror("open"); exit(1); } for (i = 0; i < (BYTES_TO_WRITE/MYBUFSIZ); i++) { rc = lseek(fd, 0, SEEK_SET); if (rc < 0) { perror("lseek"); exit(1); } rc = write(fd, buf, sizeof(buf)); if (rc < 0) { perror("write"); exit(1); } #if DO_FSYNC fdatasync(fd); if (rc < 0) { perror("fdatasync"); exit(1); } #endif } } ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [linux-lvm] fsync() and LVM 2009-03-17 16:00 ` Marco Colombo @ 2009-03-17 17:40 ` Stuart D. Gathman 2009-03-17 18:17 ` Les Mikesell 0 siblings, 1 reply; 39+ messages in thread From: Stuart D. Gathman @ 2009-03-17 17:40 UTC (permalink / raw) To: LVM general discussion and development On Tue, 17 Mar 2009, Marco Colombo wrote: > It seems to me that in my setup, disabling the caches on the disks does > bring data to the platters, and that noone is "lying" about fsync. > > Now I'm _really_ confused. That's been my claim all along - that the broken fsync only affects on disk cache. LVM itself does not reorder writes in any way - it just fails to pass along the write barrier. fsync() does *start* writing the dirty buffers (implemented in the fs code). It just doesn't wait for the writes to finish getting to the platters. Apparently, it does wait for the write to get to the drive (but I'm not certain). -- Stuart D. Gathman <stuart@bmsi.com> Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flammis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [linux-lvm] fsync() and LVM 2009-03-17 17:40 ` Stuart D. Gathman @ 2009-03-17 18:17 ` Les Mikesell 2009-03-18 0:37 ` Marco Colombo 0 siblings, 1 reply; 39+ messages in thread From: Les Mikesell @ 2009-03-17 18:17 UTC (permalink / raw) To: LVM general discussion and development Stuart D. Gathman wrote: > On Tue, 17 Mar 2009, Marco Colombo wrote: > >> It seems to me that in my setup, disabling the caches on the disks does >> bring data to the platters, and that noone is "lying" about fsync. >> >> Now I'm _really_ confused. > > That's been my claim all along - that the broken fsync only affects > on disk cache. LVM itself does not reorder writes in any way - it just > fails to pass along the write barrier. fsync() does *start* writing > the dirty buffers (implemented in the fs code). It just doesn't > wait for the writes to finish getting to the platters. Apparently, > it does wait for the write to get to the drive (but I'm not certain). Given that fsync() is supposed to return the status of the completion of the physical write, that sounds broken to me. Do the LVM's in question here have more than one underlying device, and does it matter? -- Les Mikesell lesmikesell@gmail.com ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [linux-lvm] fsync() and LVM 2009-03-17 18:17 ` Les Mikesell @ 2009-03-18 0:37 ` Marco Colombo 0 siblings, 0 replies; 39+ messages in thread From: Marco Colombo @ 2009-03-18 0:37 UTC (permalink / raw) To: LVM general discussion and development Les Mikesell wrote: > Stuart D. Gathman wrote: >> >> That's been my claim all along - that the broken fsync only affects >> on disk cache. LVM itself does not reorder writes in any way - it just >> fails to pass along the write barrier. fsync() does *start* writing >> the dirty buffers (implemented in the fs code). It just doesn't wait >> for the writes to finish getting to the platters. Apparently, >> it does wait for the write to get to the drive (but I'm not certain). > > Given that fsync() is supposed to return the status of the completion of > the physical write, that sounds broken to me. Do the LVM's in question > here have more than one underlying device, and does it matter? > According to my tests, you get a 50x speedup when you turn the cache on. It means that fsync is waiting for something to happen, and this "something" happens 50 times faster only when you turn the disk write-back cache on. It seems to me that the only explanation is that fsync is waiting for disk I/O to complete (and not just to begin otherwise the time would be the same). With the cache enabled, the disk reports completion when the data is in the cache (write-back behaviour), with cache disabled it waits for the data to be on platters (write-thru behaviour). .TM. ^ permalink raw reply [flat|nested] 39+ messages in thread
* RE: [linux-lvm] fsync() and LVM 2009-03-14 14:31 ` Stuart D. Gathman 2009-03-15 0:51 ` Marco Colombo @ 2009-03-15 8:51 ` Dietmar Maurer 2009-03-15 23:31 ` Marco Colombo 2009-03-17 18:12 ` Les Mikesell 1 sibling, 2 replies; 39+ messages in thread From: Dietmar Maurer @ 2009-03-15 8:51 UTC (permalink / raw) To: LVM general discussion and development > > Does that mean I should never use more than one device if I have > > applications depending on fsync (databases)? > > It just means that write barriers won't get passed to the device. > This is only a problem if the devices have write caches. But fsync is implemented using 'write barriers' - so fsync does not work? After fsync, all data should be sent from the OS to the disk controller: a.) this work perfectly using LVM? b.) this does not work at all using LVM? c.) it works when you use one single physical drive with LVM? I am confused. The thread on the postfix list claims that it does not work at all? - Dietmar ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [linux-lvm] fsync() and LVM 2009-03-15 8:51 ` Dietmar Maurer @ 2009-03-15 23:31 ` Marco Colombo 2009-03-17 18:12 ` Les Mikesell 1 sibling, 0 replies; 39+ messages in thread From: Marco Colombo @ 2009-03-15 23:31 UTC (permalink / raw) To: LVM general discussion and development [Please forgive double-posting, I'm not sure my previous attempt succeeded] Dietmar Maurer wrote: >>> Does that mean I should never use more than one device if I have >>> applications depending on fsync (databases)? >> It just means that write barriers won't get passed to the device. >> This is only a problem if the devices have write caches. > > But fsync is implemented using 'write barriers' - so fsync does not > work? > > After fsync, all data should be sent from the OS to the disk controller: > > a.) this work perfectly using LVM? > > b.) this does not work at all using LVM? > > c.) it works when you use one single physical drive with LVM? > > I am confused. The thread on the postfix list claims that it does not > work at > all? Well, it's on the PostgreSQL list, not postfix. But it may affect postfix as well. Quoting postfix documentation: Gory details: the Postfix mail queue requires that (1) the file system can rename a file to a near-by directory without changing the file's inode number, and that (2) mail is safely stored after fsync() of that file (not its parent directory) returns successfully, even when that file is renamed to a near-by directory at some later point in time. If fsync() doen't work, point (2) is not fulfilled. Please note: that on PostgreSQL list is not speculation. It comes from measurements. Benchmarks show too high transaction rates, just as if fsync() was disabled. The explanation (they provided) is that LVM does not honor fsync(). By some reading I've done I'm not sure. Is blkdev_issue_flush() we're talking about? Please see: http://lkml.org/lkml/2007/5/25/71 Is a LVM (well, device mapper) device still a "FLUSHABLE device" by that definition? Apparently it's ok not to support BIO_RW_BARRIER, as long as you support blkdev_issue_flush(). Has something changed since then? How would you classify a LVM device? SAFE, FLUSHABLE, BARRIER or something else (UNSAFE)? .TM. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [linux-lvm] fsync() and LVM 2009-03-15 8:51 ` Dietmar Maurer 2009-03-15 23:31 ` Marco Colombo @ 2009-03-17 18:12 ` Les Mikesell 2009-03-17 18:19 ` Dietmar Maurer 1 sibling, 1 reply; 39+ messages in thread From: Les Mikesell @ 2009-03-17 18:12 UTC (permalink / raw) To: LVM general discussion and development Dietmar Maurer wrote: >>> Does that mean I should never use more than one device if I have >>> applications depending on fsync (databases)? >> It just means that write barriers won't get passed to the device. >> This is only a problem if the devices have write caches. > > But fsync is implemented using 'write barriers' - so fsync does not > work? > > After fsync, all data should be sent from the OS to the disk controller: > > a.) this work perfectly using LVM? > > b.) this does not work at all using LVM? > > c.) it works when you use one single physical drive with LVM? > > I am confused. The thread on the postfix list claims that it does not > work at > all? Everything will seem to work until you have an inconvenient crash or disk error. That is, data will be written normally - whether you fsync or not. The point of fsync() though, is for an application to confirm that the file is committed to stable media and will be recoverable even if the application (or OS) crashes or the system loses power. The correct next action of the application will depend on the return status of the fsync() operation (e.g., acknowledging receipt of a mail message, considering a database change to be committed, etc.). What I believe is happening is that fsync() always returns as though it were successful even though the underlying operations haven't completed. That's ummm..., optimistic at best. But, everything will still work (and more quickly) as long as the physical write of the file and associated directory metadata eventually succeeds. Realistically, for most things it doesn't matter because for critical data you still have to deal with the possibility of a disk write that succeeds being unreadable later for a variety of reasons - and the rest isn't critical anyway. However, it would be good to know exactly what to expect here. -- Les Mikesell lesmikesell@gmail.com ^ permalink raw reply [flat|nested] 39+ messages in thread
* RE: [linux-lvm] fsync() and LVM 2009-03-17 18:12 ` Les Mikesell @ 2009-03-17 18:19 ` Dietmar Maurer 0 siblings, 0 replies; 39+ messages in thread From: Dietmar Maurer @ 2009-03-17 18:19 UTC (permalink / raw) To: LVM general discussion and development > Dietmar Maurer wrote: > >>> Does that mean I should never use more than one device if I have > >>> applications depending on fsync (databases)? > >> It just means that write barriers won't get passed to the device. > >> This is only a problem if the devices have write caches. > > > > But fsync is implemented using 'write barriers' - so fsync does not > > work? > > > > After fsync, all data should be sent from the OS to the disk > controller: > > > > a.) this work perfectly using LVM? > > > > b.) this does not work at all using LVM? > > > > c.) it works when you use one single physical drive with LVM? Please, can someone answer that questions? - Dietmar ^ permalink raw reply [flat|nested] 39+ messages in thread
end of thread, other threads:[~2009-03-19 9:21 UTC | newest] Thread overview: 39+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-03-13 17:46 [linux-lvm] fsync() and LVM Marco Colombo 2009-03-13 20:08 ` Stuart D. Gathman 2009-03-13 20:29 ` Ben Chobot 2009-03-13 20:38 ` Alasdair G Kergon 2009-03-14 3:16 ` Marco Colombo 2009-03-14 9:07 ` Dietmar Maurer 2009-03-14 14:31 ` Stuart D. Gathman 2009-03-15 0:51 ` Marco Colombo 2009-03-16 11:02 ` Charles Marcus 2009-03-16 11:05 ` Martin Schröder 2009-03-16 11:18 ` Charles Marcus 2009-03-16 11:25 ` Dietmar Maurer 2009-03-16 14:36 ` Marco Colombo 2009-03-16 17:13 ` Stuart D. Gathman 2009-03-16 17:17 ` Stuart D. Gathman 2009-03-16 18:50 ` Les Mikesell 2009-03-16 19:36 ` Greg Freemyer 2009-03-16 19:55 ` [linux-lvm] liblvm status question ben scott 2009-03-16 20:58 ` Greg Freemyer 2009-03-17 10:38 ` Bryn M. Reeves 2009-03-17 18:42 ` ben scott 2009-03-17 20:52 ` Greg Freemyer 2009-03-16 20:28 ` [linux-lvm] fsync() and LVM Les Mikesell 2009-03-16 20:54 ` Greg Freemyer 2009-03-16 21:17 ` Les Mikesell 2009-03-16 21:36 ` Greg Freemyer 2009-03-16 21:53 ` Les Mikesell 2009-03-16 22:51 ` Joshua D. Drake 2009-03-17 15:33 ` Joshua D. Drake 2009-03-19 9:20 ` Tim Post 2009-03-16 21:57 ` Allen, Jack 2009-03-17 16:00 ` Marco Colombo 2009-03-17 17:40 ` Stuart D. Gathman 2009-03-17 18:17 ` Les Mikesell 2009-03-18 0:37 ` Marco Colombo 2009-03-15 8:51 ` Dietmar Maurer 2009-03-15 23:31 ` Marco Colombo 2009-03-17 18:12 ` Les Mikesell 2009-03-17 18:19 ` Dietmar Maurer
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.