* Re: [Btrfs-devel] transaction ioctls [not found] <Pine.LNX.4.64.0804221210130.23551@cobra.newdream.net> @ 2008-04-22 20:29 ` Zach Brown 2008-04-22 20:41 ` Chris Mason 2008-04-22 20:32 ` Chris Mason 1 sibling, 1 reply; 19+ messages in thread From: Zach Brown @ 2008-04-22 20:29 UTC (permalink / raw) To: Sage Weil; +Cc: btrfs-devel, linux-btrfs > A misbehaving application could also deliberately hold a transaction open, > effectively locking up the FS, so it may make sense to restrict something > like this to root or something. I suspect it doesn't have to be deliberate. Have you tried this under memory pressure? I wonder if the application can get stuck waiting for memory which will only be freed once the transaction closes. I'm reasonably sure that we've discussed this persistent theoretical problem with these kinds of interfaces ;). - z ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Btrfs-devel] transaction ioctls 2008-04-22 20:29 ` [Btrfs-devel] transaction ioctls Zach Brown @ 2008-04-22 20:41 ` Chris Mason 2008-04-22 20:52 ` Sage Weil 0 siblings, 1 reply; 19+ messages in thread From: Chris Mason @ 2008-04-22 20:41 UTC (permalink / raw) To: btrfs-devel; +Cc: Zach Brown, Sage Weil, linux-btrfs On Tuesday 22 April 2008, Zach Brown wrote: > > A misbehaving application could also deliberately hold a transaction > > open, effectively locking up the FS, so it may make sense to restrict > > something like this to root or something. > > I suspect it doesn't have to be deliberate. > > Have you tried this under memory pressure? I wonder if the application > can get stuck waiting for memory which will only be freed once the > transaction closes. This isn't as big an issue, btrfs doesn't pin pages while the transaction is running. There is some accounting rbtrees that grow while the transaction is running, but it isn't like a reiserfsv3 or jbd that have physical blocks on disk pinned. > > I'm reasonably sure that we've discussed this persistent theoretical > problem with these kinds of interfaces ;). I do agree is isn't practical for anything other than a tightly controlled interface. It might make sense to create specific ioctls or syscalls for the operations you need to combine. Perhaps a generic mechanism that can link a bunch of async syscalls together within a single framework. Ok, really though, I seem to remember that ceph needed to do file + xattr operations in one atomic shot, were there others? -chris ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Btrfs-devel] transaction ioctls 2008-04-22 20:41 ` Chris Mason @ 2008-04-22 20:52 ` Sage Weil 2008-04-22 20:55 ` Chris Mason 0 siblings, 1 reply; 19+ messages in thread From: Sage Weil @ 2008-04-22 20:52 UTC (permalink / raw) To: Chris Mason; +Cc: btrfs-devel, Zach Brown, linux-btrfs On Tue, 22 Apr 2008, Chris Mason wrote: > Ok, really though, I seem to remember that ceph needed to do file + xattr > operations in one atomic shot, were there others? The transactions generally look like write(a) setxattr(a) write(b) setxattr(b) It _could_ be broken down into write intent, do X, log X, such that the atomicity isn't strictly necessary, but itd be so much nicer to just wrap things up into tidy transactions. sage ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Btrfs-devel] transaction ioctls 2008-04-22 20:52 ` Sage Weil @ 2008-04-22 20:55 ` Chris Mason 2008-04-22 20:56 ` Sage Weil 2008-04-22 21:05 ` Evgeniy Polyakov 0 siblings, 2 replies; 19+ messages in thread From: Chris Mason @ 2008-04-22 20:55 UTC (permalink / raw) To: Sage Weil; +Cc: btrfs-devel, Zach Brown, linux-btrfs On Tuesday 22 April 2008, Sage Weil wrote: > On Tue, 22 Apr 2008, Chris Mason wrote: > > Ok, really though, I seem to remember that ceph needed to do file + xattr > > operations in one atomic shot, were there others? > > The transactions generally look like > > write(a) > setxattr(a) > write(b) > setxattr(b) Hmm, is this whole thing the atomic unit, or can a and b be done separately? -chris ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Btrfs-devel] transaction ioctls 2008-04-22 20:55 ` Chris Mason @ 2008-04-22 20:56 ` Sage Weil 2008-04-22 21:05 ` Evgeniy Polyakov 1 sibling, 0 replies; 19+ messages in thread From: Sage Weil @ 2008-04-22 20:56 UTC (permalink / raw) To: Chris Mason; +Cc: btrfs-devel, Zach Brown, linux-btrfs On Tue, 22 Apr 2008, Chris Mason wrote: > On Tuesday 22 April 2008, Sage Weil wrote: > > On Tue, 22 Apr 2008, Chris Mason wrote: > > > Ok, really though, I seem to remember that ceph needed to do file + xattr > > > operations in one atomic shot, were there others? > > > > The transactions generally look like > > > > write(a) > > setxattr(a) > > write(b) > > setxattr(b) > > Hmm, is this whole thing the atomic unit, or can a and b be done separately? The whole thing. sage ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Btrfs-devel] transaction ioctls 2008-04-22 20:55 ` Chris Mason 2008-04-22 20:56 ` Sage Weil @ 2008-04-22 21:05 ` Evgeniy Polyakov 2008-04-23 12:50 ` Chris Mason 1 sibling, 1 reply; 19+ messages in thread From: Evgeniy Polyakov @ 2008-04-22 21:05 UTC (permalink / raw) To: Chris Mason; +Cc: Sage Weil, btrfs-devel, Zach Brown, linux-btrfs Hi. On Tue, Apr 22, 2008 at 04:55:37PM -0400, Chris Mason (chris.mason@oracle.com) wrote: > > The transactions generally look like > > > > write(a) > > setxattr(a) > > write(b) > > setxattr(b) > > Hmm, is this whole thing the atomic unit, or can a and b be done separately? No, main idea is to bind very different operations together and make them look atomic from userspace point of view. But transactions are nothing without ability to correctly unroll them on demand. Transaction can include any operation with data and metadata. -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Btrfs-devel] transaction ioctls 2008-04-22 21:05 ` Evgeniy Polyakov @ 2008-04-23 12:50 ` Chris Mason 2008-04-23 12:57 ` Evgeniy Polyakov 0 siblings, 1 reply; 19+ messages in thread From: Chris Mason @ 2008-04-23 12:50 UTC (permalink / raw) To: Evgeniy Polyakov; +Cc: Sage Weil, btrfs-devel, Zach Brown, linux-btrfs On Tuesday 22 April 2008, Evgeniy Polyakov wrote: > Hi. > > On Tue, Apr 22, 2008 at 04:55:37PM -0400, Chris Mason (chris.mason@oracle.com) wrote: > > > The transactions generally look like > > > > > > write(a) > > > setxattr(a) > > > write(b) > > > setxattr(b) > > > > Hmm, is this whole thing the atomic unit, or can a and b be done > > separately? > > No, main idea is to bind very different operations together and make > them look atomic from userspace point of view. But transactions are > nothing without ability to correctly unroll them on demand. > Transaction can include any operation with data and metadata. Transaction rollback from a filesystem point of view is a reboot. Real database style transactions with rollback and isolation from other procs etc etc are outside the scope of Btrfs. -chris ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Btrfs-devel] transaction ioctls 2008-04-23 12:50 ` Chris Mason @ 2008-04-23 12:57 ` Evgeniy Polyakov 2008-04-23 13:07 ` Chris Mason 0 siblings, 1 reply; 19+ messages in thread From: Evgeniy Polyakov @ 2008-04-23 12:57 UTC (permalink / raw) To: Chris Mason; +Cc: Sage Weil, btrfs-devel, Zach Brown, linux-btrfs Hi Chris. On Wed, Apr 23, 2008 at 08:50:54AM -0400, Chris Mason (chris.mason@oracle.com) wrote: > Transaction rollback from a filesystem point of view is a reboot. Real > database style transactions with rollback and isolation from other procs etc > etc are outside the scope of Btrfs. Why rollback is a reboot? With copy-on-write it could be possible to just commit tree state, which was before transaction start, as a current one and thus rollback all changes. Having that possibility from userspace could be a great benefit, since in case of application error it is relly simple to undo all changes. -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Btrfs-devel] transaction ioctls 2008-04-23 12:57 ` Evgeniy Polyakov @ 2008-04-23 13:07 ` Chris Mason 2008-04-23 13:15 ` Evgeniy Polyakov 2008-04-23 17:12 ` btrfs-devel 0 siblings, 2 replies; 19+ messages in thread From: Chris Mason @ 2008-04-23 13:07 UTC (permalink / raw) To: Evgeniy Polyakov; +Cc: Sage Weil, btrfs-devel, Zach Brown, linux-btrfs On Wednesday 23 April 2008, Evgeniy Polyakov wrote: > Hi Chris. > > On Wed, Apr 23, 2008 at 08:50:54AM -0400, Chris Mason (chris.mason@oracle.com) wrote: > > Transaction rollback from a filesystem point of view is a reboot. Real > > database style transactions with rollback and isolation from other procs > > etc etc are outside the scope of Btrfs. > > Why rollback is a reboot? With copy-on-write it could be possible to > just commit tree state, which was before transaction start, as a current > one and thus rollback all changes. Having that possibility from > userspace could be a great benefit, since in case of application error > it is relly simple to undo all changes. Oh, from a filesystem point of view it is very simple to undo changes, especially with COW. We've got snapshots and we can pull old copies from an old snapshot etc etc. But, userland expects things not to be undone. Picture two procs operating in a directory. One proc calls fsync and gets assurance from the FS that things are on disk. The other proc calls rollback and undoes the fsync. The posix API isn't built around this. There are definitely cases where the admin will want to be able to run a command to shift the FS state back to some time in the past. But, it needs to be an admin level tool where the complex interactions between procs are well understood (by the admin). -chris ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Btrfs-devel] transaction ioctls 2008-04-23 13:07 ` Chris Mason @ 2008-04-23 13:15 ` Evgeniy Polyakov 2008-04-23 13:23 ` Chris Mason [not found] ` <200804230923.03752.chris.mason@oracle.com> 2008-04-23 17:12 ` btrfs-devel 1 sibling, 2 replies; 19+ messages in thread From: Evgeniy Polyakov @ 2008-04-23 13:15 UTC (permalink / raw) To: Chris Mason; +Cc: Sage Weil, btrfs-devel, Zach Brown, linux-btrfs On Wed, Apr 23, 2008 at 09:07:28AM -0400, Chris Mason (chris.mason@oracle.com) wrote: > But, userland expects things not to be undone. Picture two procs operating in > a directory. One proc calls fsync and gets assurance from the FS that things > are on disk. The other proc calls rollback and undoes the fsync. The posix > API isn't built around this. Rollback happens on transaction, so first application called fsync in own trasaction, which flushed data to disk, while second thread has own trasaction, and that data will be removed, while data written in first transaction is still on disk. > There are definitely cases where the admin will want to be able to run a > command to shift the FS state back to some time in the past. But, it needs > to be an admin level tool where the complex interactions between procs are > well understood (by the admin). Well, to allow or not to allow transaction mechanism to users is the last question imho, from security point of view it can be limited to admin only, although if transaction is only a label to operation, then it can be allowed to be done for users too... -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Btrfs-devel] transaction ioctls 2008-04-23 13:15 ` Evgeniy Polyakov @ 2008-04-23 13:23 ` Chris Mason [not found] ` <200804230923.03752.chris.mason@oracle.com> 1 sibling, 0 replies; 19+ messages in thread From: Chris Mason @ 2008-04-23 13:23 UTC (permalink / raw) To: Evgeniy Polyakov; +Cc: Sage Weil, btrfs-devel, Zach Brown, linux-btrfs On Wednesday 23 April 2008, Evgeniy Polyakov wrote: > On Wed, Apr 23, 2008 at 09:07:28AM -0400, Chris Mason (chris.mason@oracle.com) wrote: > > But, userland expects things not to be undone. Picture two procs > > operating in a directory. One proc calls fsync and gets assurance from > > the FS that things are on disk. The other proc calls rollback and undoes > > the fsync. The posix API isn't built around this. > > Rollback happens on transaction, so first application called fsync in > own trasaction, which flushed data to disk, while second thread has own > trasaction, and that data will be removed, while data written in first > transaction is still on disk. The kind of logging this requires is outside the scope of Btrfs ;) It is possible if both procs are running in different tree roots, but how about: proc A: mkdir dir1 proc A: create dir1/file1 proc B: add data to dir1/file1 proc B: fsync dir1/file1 proc A: rollback Filesystems can be databases, but not with the current APIs. Userland simply isn't built around these semantics today. -chris ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <200804230923.03752.chris.mason@oracle.com>]
* Re: [Btrfs-devel] transaction ioctls [not found] ` <200804230923.03752.chris.mason@oracle.com> @ 2008-04-23 16:21 ` Evgeniy Polyakov 2008-04-23 17:15 ` Chris Mason 2008-04-23 23:52 ` Bron Gondwana 1 sibling, 1 reply; 19+ messages in thread From: Evgeniy Polyakov @ 2008-04-23 16:21 UTC (permalink / raw) To: Chris Mason; +Cc: Sage Weil, btrfs-devel, Zach Brown, linux-btrfs On Wed, Apr 23, 2008 at 09:23:03AM -0400, Chris Mason (chris.mason@oracle.com) wrote: > The kind of logging this requires is outside the scope of Btrfs ;) It is > possible if both procs are running in different tree roots, but how about: > > proc A: mkdir dir1 > proc A: create dir1/file1 > proc B: add data to dir1/file1 > proc B: fsync dir1/file1 > proc A: rollback Depending on where transaction was started and where it was stopped. If there are exactly two transactions started and stopped at the start and the end of the 'trace', then rollback of transaction A means rollback of the inner transactions too. > Filesystems can be databases, but not with the current APIs. Userland simply > isn't built around these semantics today. This is a philosiphical disput, I always believed that there is no difference between database and filesystem, but only access method changes. And having proper API is just a matter of taste: one can create ioctl based one to be private feature of the new filesystem. No one argues that XFS operation system should be transformed into XFS filesystem and generic VFS helpers (although that could be a good idea). -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Btrfs-devel] transaction ioctls 2008-04-23 16:21 ` Evgeniy Polyakov @ 2008-04-23 17:15 ` Chris Mason 0 siblings, 0 replies; 19+ messages in thread From: Chris Mason @ 2008-04-23 17:15 UTC (permalink / raw) To: Evgeniy Polyakov; +Cc: Sage Weil, btrfs-devel, Zach Brown, linux-btrfs On Wednesday 23 April 2008, Evgeniy Polyakov wrote: > On Wed, Apr 23, 2008 at 09:23:03AM -0400, Chris Mason (chris.mason@oracle.com) wrote: > > The kind of logging this requires is outside the scope of Btrfs ;) It is > > possible if both procs are running in different tree roots, but how > > about: > > > > proc A: mkdir dir1 > > proc A: create dir1/file1 > > proc B: add data to dir1/file1 > > proc B: fsync dir1/file1 > > proc A: rollback > > Depending on where transaction was started and where it was stopped. If > there are exactly two transactions started and stopped at the start and > the end of the 'trace', then rollback of transaction A means rollback of > the inner transactions too. Right, these are things that real databases solve that posix doesn't expect or understand. The rollback will make the file disappear, but some other process could have the file open, without a transaction running. So, the rollback needs to provide the same semantics you get today with unlink on an open file. Definitely not impossible, but really outside the scope of btrfs. > > > Filesystems can be databases, but not with the current APIs. Userland > > simply isn't built around these semantics today. > > This is a philosiphical disput, I always believed that there is no > difference between database and filesystem, but only access method > changes. We agree there, except that filesystems are able to include a number of optimizations that databases can't because they don't have to do rollback or private views of the data. > And having proper API is just a matter of taste: one can create > ioctl based one to be private feature of the new filesystem. No one > argues that XFS operation system should be transformed into XFS > filesystem and generic VFS helpers (although that could be a good idea). It isn't just the API, it is the rules surrounding how and when files or directories can disappear. Please, prove me wrong...patches always welcome. -chris ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Btrfs-devel] transaction ioctls [not found] ` <200804230923.03752.chris.mason@oracle.com> 2008-04-23 16:21 ` Evgeniy Polyakov @ 2008-04-23 23:52 ` Bron Gondwana 2008-04-24 13:06 ` Chris Mason [not found] ` <200804240906.54788.chris.mason@oracle.com> 1 sibling, 2 replies; 19+ messages in thread From: Bron Gondwana @ 2008-04-23 23:52 UTC (permalink / raw) To: Chris Mason; +Cc: Evgeniy Polyakov, btrfs-devel, Zach Brown, linux-btrfs On Wed, Apr 23, 2008 at 09:23:03AM -0400, Chris Mason wrote: > On Wednesday 23 April 2008, Evgeniy Polyakov wrote: > > On Wed, Apr 23, 2008 at 09:07:28AM -0400, Chris Mason > (chris.mason@oracle.com) wrote: > > > But, userland expects things not to be undone. Picture two procs > > > operating in a directory. One proc calls fsync and gets assurance from > > > the FS that things are on disk. The other proc calls rollback and undoes > > > the fsync. The posix API isn't built around this. > > > > Rollback happens on transaction, so first application called fsync in > > own trasaction, which flushed data to disk, while second thread has own > > trasaction, and that data will be removed, while data written in first > > transaction is still on disk. > > The kind of logging this requires is outside the scope of Btrfs ;) It is > possible if both procs are running in different tree roots, but how about: > > proc A: mkdir dir1 > proc A: create dir1/file1 > proc B: add data to dir1/file1 > proc B: fsync dir1/file1 > proc A: rollback > > Filesystems can be databases, but not with the current APIs. Userland simply > isn't built around these semantics today. proc A: mkdir dir1 proc A: create dir1/file1 proc B: add data to dir1/file1 proc B: fsync dir1/file1 proc A: unlink dir1/file1 proc A: rmdir dir1 I don't see the difference. Bron. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Btrfs-devel] transaction ioctls 2008-04-23 23:52 ` Bron Gondwana @ 2008-04-24 13:06 ` Chris Mason [not found] ` <200804240906.54788.chris.mason@oracle.com> 1 sibling, 0 replies; 19+ messages in thread From: Chris Mason @ 2008-04-24 13:06 UTC (permalink / raw) To: Bron Gondwana; +Cc: Evgeniy Polyakov, btrfs-devel, Zach Brown, linux-btrfs On Wednesday 23 April 2008, Bron Gondwana wrote: > On Wed, Apr 23, 2008 at 09:23:03AM -0400, Chris Mason wrote: > > On Wednesday 23 April 2008, Evgeniy Polyakov wrote: > > > On Wed, Apr 23, 2008 at 09:07:28AM -0400, Chris Mason > > > > (chris.mason@oracle.com) wrote: > > > > But, userland expects things not to be undone. Picture two procs > > > > operating in a directory. One proc calls fsync and gets assurance > > > > from the FS that things are on disk. The other proc calls rollback > > > > and undoes the fsync. The posix API isn't built around this. > > > > > > Rollback happens on transaction, so first application called fsync in > > > own trasaction, which flushed data to disk, while second thread has own > > > trasaction, and that data will be removed, while data written in first > > > transaction is still on disk. > > > > The kind of logging this requires is outside the scope of Btrfs ;) It is > > possible if both procs are running in different tree roots, but how > > about: > > > > proc A: mkdir dir1 > > proc A: create dir1/file1 > > proc B: add data to dir1/file1 > > proc B: fsync dir1/file1 > > proc A: rollback > > > > Filesystems can be databases, but not with the current APIs. Userland > > simply isn't built around these semantics today. > > proc A: mkdir dir1 > proc A: create dir1/file1 > proc B: add data to dir1/file1 > proc B: fsync dir1/file1 > proc A: unlink dir1/file1 > proc A: rmdir dir1 > > I don't see the difference. The main difference is that in the unlink case, the unlink goes through a series of code in the VFS to make sure that open file handles stay viable and that all of the other posix rules are followed. In the rollback case, the filesystem has to do all of that on its own. Here's another: proc A: mkdir dir1 proc B: open dir1/file1 O_CREATE proc A: rollback proc B: close Doing the same thing with rmdir would fail because the directory wasn't empty. In order to provide the rollback, the FS would have to wander through all of the dentries and do something sane with them. It could rename the directory to dir1.soontobedead and clean it as soon as proc B was done. The main point is this kind of thing is littered with corner cases. You'd have to find each file or directory affected by the rollback and make sure appropriate actions are taken for each one, and get it done in a VFS friendly deadlock free way. It would definitely be an interesting project. But, a much more common feature request is the ability to do a few small things in an atomic unit (like Ceph), and I think that is a much more realistic project for the short term. -chris ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <200804240906.54788.chris.mason@oracle.com>]
* Re: [Btrfs-devel] transaction ioctls [not found] ` <200804240906.54788.chris.mason@oracle.com> @ 2008-04-27 10:55 ` Bron Gondwana 0 siblings, 0 replies; 19+ messages in thread From: Bron Gondwana @ 2008-04-27 10:55 UTC (permalink / raw) To: Chris Mason; +Cc: Evgeniy Polyakov, btrfs-devel, Zach Brown, linux-btrfs On Thu, 24 Apr 2008 09:06:54 -0400, "Chris Mason" <chris.mason@oracle.com> said: > On Wednesday 23 April 2008, Bron Gondwana wrote: > > On Wed, Apr 23, 2008 at 09:23:03AM -0400, Chris Mason wrote: > > > proc A: mkdir dir1 > > > proc A: create dir1/file1 > > > proc B: add data to dir1/file1 > > > proc B: fsync dir1/file1 > > > proc A: rollback > > > > > > Filesystems can be databases, but not with the current APIs. Userland > > > simply isn't built around these semantics today. > > > > proc A: mkdir dir1 > > proc A: create dir1/file1 > > proc B: add data to dir1/file1 > > proc B: fsync dir1/file1 > > proc A: unlink dir1/file1 > > proc A: rmdir dir1 > > > > I don't see the difference. > > The main difference is that in the unlink case, the unlink goes through a > series of code in the VFS to make sure that open file handles stay viable > and that all of the other posix rules are followed. In the rollback case, > the filesystem has to do all of that on its own. > > Here's another: > > proc A: mkdir dir1 > proc B: open dir1/file1 O_CREATE > proc A: rollback > proc B: close > > [... I've trimmed the following a bit, it's only partially quoted...] > > Doing the same thing with rmdir would fail because the directory wasn't > empty. In order to provide the rollback, the FS would have to wander > through all of the dentries and do something sane with them.... > > The main point is this kind of thing is littered with corner cases. > You'd have to find each file or directory affected by the rollback > and make sure appropriate actions are taken for each one, and get > it done in a VFS friendly deadlock free way. Yeah, that's a good point. I suspect my first pass idea for this would look remarkably like a soft-mounted NFS drive that had been disconnected. Ooops, your little bit of filesystem went away - EIO, byebye. Bron. -- Bron Gondwana brong@fastmail.fm ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Btrfs-devel] transaction ioctls 2008-04-23 13:07 ` Chris Mason 2008-04-23 13:15 ` Evgeniy Polyakov @ 2008-04-23 17:12 ` btrfs-devel 1 sibling, 0 replies; 19+ messages in thread From: btrfs-devel @ 2008-04-23 17:12 UTC (permalink / raw) To: linux-btrfs > But, userland expects things not to be undone. Picture two procs > operating in a directory. One proc calls fsync and gets assurance from > the FS that things are on disk. The other proc calls rollback and undoes > the fsync. The posix API isn't built around this. Right, but it's not out of the question to dedicate a filesystem to a single application. This is reasonably common even with LVMs. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Btrfs-devel] transaction ioctls [not found] <Pine.LNX.4.64.0804221210130.23551@cobra.newdream.net> 2008-04-22 20:29 ` [Btrfs-devel] transaction ioctls Zach Brown @ 2008-04-22 20:32 ` Chris Mason 2008-04-22 20:48 ` Sage Weil 1 sibling, 1 reply; 19+ messages in thread From: Chris Mason @ 2008-04-22 20:32 UTC (permalink / raw) To: btrfs-devel; +Cc: Sage Weil, linux-btrfs On Tuesday 22 April 2008, Sage Weil wrote: > Hi Chris, > > These ioctls let a user application hold a transaction open while it > performs a series of operations. A final ioctl does a sync on the fs > (closing the current transaction). This is the main requirement for > Ceph's OSD to be able to keep the data it's storing in a btrfs volume > consistent, and AFAICS it works just fine. The application would do > something like > I'm definitely willing to include it for you to experiment with. Holding a transaction from userland can indeed lead to deadlock, but in this case your userland basically owns the server anyway. I'm worried about some nasty corner cases still, but btrfs is blissfully ignoring those right now anyway. One problem will be operations that are basically boundless (truncating a file, large writes). Eventually the ENOSPC support will hook into the transaction system to make sure a given operation reserves enough free space. With your ioctls, the "do a bunch of stuff" will need to honor the same accounting rules as the kernel code (which don't exist yet). I thought your original plan was to do all of this from userland (without a kernel filesystem at all)? The btrfs progs share most of the same code with the kernel, so with a little love to the transaction and IO subsystems, you'd be able to use it as a library style DB. -chris ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Btrfs-devel] transaction ioctls 2008-04-22 20:32 ` Chris Mason @ 2008-04-22 20:48 ` Sage Weil 0 siblings, 0 replies; 19+ messages in thread From: Sage Weil @ 2008-04-22 20:48 UTC (permalink / raw) To: Chris Mason; +Cc: btrfs-devel, linux-btrfs On Tue, 22 Apr 2008, Chris Mason wrote: > I'm definitely willing to include it for you to experiment with. Holding a > transaction from userland can indeed lead to deadlock, but in this case your > userland basically owns the server anyway. I'm worried about some nasty > corner cases still, but btrfs is blissfully ignoring those right now anyway. > > One problem will be operations that are basically boundless (truncating a > file, large writes). Eventually the ENOSPC support will hook into the > transaction system to make sure a given operation reserves enough free space. > > With your ioctls, the "do a bunch of stuff" will need to honor the same > accounting rules as the kernel code (which don't exist yet). So, if the transaction start ioctl made a space reservation, and if _all_ fs ops were wrapped by such reservations, that should avoid ENOSPC, yeah? That's doesn't really help with the memory pressure issue, though. :( > I thought your original plan was to do all of this from userland (without a > kernel filesystem at all)? The btrfs progs share most of the same code with > the kernel, so with a little love to the transaction and IO subsystems, you'd > be able to use it as a library style DB. Yeah... The issue is just that "a little love" is significantly more love than this handful of ioctls, and I'm a little wary of getting into it. That does seem like a better long term solution, though. sage ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2008-04-27 10:55 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <Pine.LNX.4.64.0804221210130.23551@cobra.newdream.net>
2008-04-22 20:29 ` [Btrfs-devel] transaction ioctls Zach Brown
2008-04-22 20:41 ` Chris Mason
2008-04-22 20:52 ` Sage Weil
2008-04-22 20:55 ` Chris Mason
2008-04-22 20:56 ` Sage Weil
2008-04-22 21:05 ` Evgeniy Polyakov
2008-04-23 12:50 ` Chris Mason
2008-04-23 12:57 ` Evgeniy Polyakov
2008-04-23 13:07 ` Chris Mason
2008-04-23 13:15 ` Evgeniy Polyakov
2008-04-23 13:23 ` Chris Mason
[not found] ` <200804230923.03752.chris.mason@oracle.com>
2008-04-23 16:21 ` Evgeniy Polyakov
2008-04-23 17:15 ` Chris Mason
2008-04-23 23:52 ` Bron Gondwana
2008-04-24 13:06 ` Chris Mason
[not found] ` <200804240906.54788.chris.mason@oracle.com>
2008-04-27 10:55 ` Bron Gondwana
2008-04-23 17:12 ` btrfs-devel
2008-04-22 20:32 ` Chris Mason
2008-04-22 20:48 ` Sage Weil
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).