* reiser4: discard implementation, part 2: FITRIM ioctl aka batch mode @ 2014-06-30 9:35 Ivan Shapovalov 2014-06-30 12:19 ` Edward Shishkin 0 siblings, 1 reply; 4+ messages in thread From: Ivan Shapovalov @ 2014-06-30 9:35 UTC (permalink / raw) To: reiserfs-devel; +Cc: Edward Shishkin [-- Attachment #1: Type: text/plain, Size: 1096 bytes --] Hi, I've started to think about implementing the second part of discard support, namely "batch mode" (FITRIM ioctl). And it seems like I don't yet quite understand how to do it. It had been suggested to reuse existing transaction and discard machinery for this feature: create a transaction, allocate+deallocate all possible blocks and then force-commit it. However, the algorithms of discard_atom() are very inoptimal for discarding large amounts of known free space -- a bitmap check is performed for every single discard unit. Repeatedly calling reiser4_alloc_blocks() to allocate every possible block also seems inefficient. And this will still miss those 10% of reserved space, IIUC. So the best way I can imagine is to introduce a new space allocator method, "iterate free space", and discard all reported extents (blkdev_issue_discard() will take care of aligning them properly). With such method, a question arises: how to prevent bitmap modifications and disk writes to free space when such iteration is in progress? Thanks, -- Ivan Shapovalov / intelfx / [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 213 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: reiser4: discard implementation, part 2: FITRIM ioctl aka batch mode 2014-06-30 9:35 reiser4: discard implementation, part 2: FITRIM ioctl aka batch mode Ivan Shapovalov @ 2014-06-30 12:19 ` Edward Shishkin 2014-06-30 13:24 ` Ivan Shapovalov 0 siblings, 1 reply; 4+ messages in thread From: Edward Shishkin @ 2014-06-30 12:19 UTC (permalink / raw) To: Ivan Shapovalov; +Cc: reiserfs-devel Hi Ivan, I think that everything should be quite simple. Let the FITRIM ioctl spawn a process, which scans all the bitmap blocks and puts the free-space-extents to the deferred delete_set. At the same time, we mark them as "busy" in the WORKING bitmap to make sure that nobody will touch them (the post_commit_hook will make them clean again). Basically, that's all: our transaction manager and the discard code will do all the other work :) Details First, the trim process reads up a bitmap node and puts it to the transaction (see try_capture() and friends). Then we go from left to right in the region of blocks covered by the bitmap node and handle the "free extents". In every such iteration we lock the bitmap node, locate the free extent, mark it dirty in the WORKING bitmap, put the respective entry to the deferred delete_set and unlock the bitmap node. NOTE that commit of atoms spawned by the trim process will be unusual: no dirty jnodes, hence, no flush, no IOs (as RELOCATE and OVERWRITE sets are empty). Only issuing discard requests.. Of course, this is only in the case when other processes spawning dirty jnodes didn't join our atom (they can do it perfectly, as we lock bitmaps only per a free extent handling). There can be potential problems around the "unusual" commits (false positives in the debugging code, etc.), but I hope they are minor.. Thanks, Edward. On 06/30/2014 11:35 AM, Ivan Shapovalov wrote: > Hi, > > I've started to think about implementing the second part of discard support, > namely "batch mode" (FITRIM ioctl). And it seems like I don't yet quite > understand how to do it. > > It had been suggested to reuse existing transaction and discard machinery for > this feature: create a transaction, allocate+deallocate all possible blocks > and then force-commit it. > > However, the algorithms of discard_atom() are very inoptimal for discarding > large amounts of known free space -- a bitmap check is performed for every > single discard unit. Repeatedly calling reiser4_alloc_blocks() to allocate > every possible block also seems inefficient. And this will still miss those > 10% of reserved space, IIUC. > > So the best way I can imagine is to introduce a new space allocator method, > "iterate free space", and discard all reported extents (blkdev_issue_discard() > will take care of aligning them properly). > > With such method, a question arises: how to prevent bitmap modifications > and disk writes to free space when such iteration is in progress? > > Thanks, ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: reiser4: discard implementation, part 2: FITRIM ioctl aka batch mode 2014-06-30 12:19 ` Edward Shishkin @ 2014-06-30 13:24 ` Ivan Shapovalov 2014-06-30 23:59 ` Edward Shishkin 0 siblings, 1 reply; 4+ messages in thread From: Ivan Shapovalov @ 2014-06-30 13:24 UTC (permalink / raw) To: Edward Shishkin; +Cc: reiserfs-devel [-- Attachment #1: Type: text/plain, Size: 3995 bytes --] On Monday 30 June 2014 at 14:19:33, Edward Shishkin wrote: > Hi Ivan, > > I think that everything should be quite simple. > > Let the FITRIM ioctl spawn a process, which scans all the bitmap blocks > and puts the free-space-extents to the deferred delete_set. At the same > time, we mark them as "busy" in the WORKING bitmap to make sure that > nobody will touch them (the post_commit_hook will make them clean > again). Basically, that's all: our transaction manager and the discard > code will do all the other work :) As I've stated, I'm afraid that it will be very inoptimal. The current algorithm used for "real-time" discard is not suited for discarding big chunks of known free space: it verifies each discard unit by querying the bitmap. (This is even worse as current implementation is simplified to do one bitmap query per each discard unit, not checking if a bitmap block is bigger than a discard unit, thus it can result in doing multiple bitmap queries for the same block.) However, we can add a per-transaction flag "batch mode discard" which changes the discard algorithm to something suitable for this scenario. If this "discard transaction" can be merged with usual ones, we still need to verify all extents... But it can be done not so thoroughly. - do one bitmap query per each recorded extent - if OK, then discard - if not, just skip the whole extent > > Details > > First, the trim process reads up a bitmap node and puts it to the > transaction (see try_capture() and friends). Then we go from left to Hm, bitmap nodes are never captured by current code, so won't this effectively be a no-op? > right in the region of blocks covered by the bitmap node and handle > the "free extents". In every such iteration we lock the bitmap node, > locate the free extent, mark it dirty in the WORKING bitmap, put the > respective entry to the deferred delete_set and unlock the bitmap node. This can be done on allocator level by adding an "iterate free space" interface method... > > NOTE that commit of atoms spawned by the trim process will be unusual: > no dirty jnodes, hence, no flush, no IOs (as RELOCATE and OVERWRITE > sets are empty). Only issuing discard requests.. Of course, this is > only in the case when other processes spawning dirty jnodes didn't > join our atom (they can do it perfectly, as we lock bitmaps only per a > free extent handling). Again, bitmap nodes are not captured by current code. Is it critical that we fuse with anything modifying the bitmaps? Thanks, -- Ivan Shapovalov / intelfx / > > There can be potential problems around the "unusual" commits (false > positives in the debugging code, etc.), but I hope they are minor.. > > Thanks, > Edward. > > > On 06/30/2014 11:35 AM, Ivan Shapovalov wrote: > > Hi, > > > > I've started to think about implementing the second part of discard support, > > namely "batch mode" (FITRIM ioctl). And it seems like I don't yet quite > > understand how to do it. > > > > It had been suggested to reuse existing transaction and discard machinery for > > this feature: create a transaction, allocate+deallocate all possible blocks > > and then force-commit it. > > > > However, the algorithms of discard_atom() are very inoptimal for discarding > > large amounts of known free space -- a bitmap check is performed for every > > single discard unit. Repeatedly calling reiser4_alloc_blocks() to allocate > > every possible block also seems inefficient. And this will still miss those > > 10% of reserved space, IIUC. > > > > So the best way I can imagine is to introduce a new space allocator method, > > "iterate free space", and discard all reported extents (blkdev_issue_discard() > > will take care of aligning them properly). > > > > With such method, a question arises: how to prevent bitmap modifications > > and disk writes to free space when such iteration is in progress? > > > > Thanks, > [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 213 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: reiser4: discard implementation, part 2: FITRIM ioctl aka batch mode 2014-06-30 13:24 ` Ivan Shapovalov @ 2014-06-30 23:59 ` Edward Shishkin 0 siblings, 0 replies; 4+ messages in thread From: Edward Shishkin @ 2014-06-30 23:59 UTC (permalink / raw) To: Ivan Shapovalov; +Cc: reiserfs-devel On 06/30/2014 03:24 PM, Ivan Shapovalov wrote: > On Monday 30 June 2014 at 14:19:33, Edward Shishkin wrote: >> Hi Ivan, >> >> I think that everything should be quite simple. >> >> Let the FITRIM ioctl spawn a process, which scans all the bitmap blocks >> and puts the free-space-extents to the deferred delete_set. At the same >> time, we mark them as "busy" in the WORKING bitmap to make sure that >> nobody will touch them (the post_commit_hook will make them clean >> again). Basically, that's all: our transaction manager and the discard >> code will do all the other work :) > As I've stated, I'm afraid that it will be very inoptimal. The current > algorithm used for "real-time" discard is not suited for discarding > big chunks of known free space: So who knows the size of free-space extents when we run the ioctl? > it verifies each discard unit by > querying the bitmap. (This is even worse as current implementation > is simplified to do one bitmap query per each discard unit, not checking > if a bitmap block is bigger than a discard unit, Why do we need to check it? One bitmap block covers 128MiB. Are there such discard units in the nature? > thus it can result > in doing multiple bitmap queries for the same block.) > > However, we can add a per-transaction flag "batch mode discard" which changes > the discard algorithm to something suitable for this scenario. > If this "discard transaction" can be merged with usual ones, we still > need to verify all extents... But it can be done not so thoroughly. > - do one bitmap query per each recorded extent > - if OK, then discard > - if not, just skip the whole extent The only optimization I can see is skipping the phases of sorting and merge. Because we scan bitmaps from left to right and nobody breaks the order of extents in the delete set (assuming that nobody joins our atom, see below). In other bits everything looks like in the case of "real-time" discard, that is like "capturing" free-space extents of random offsets and sizes. No? > >> Details >> >> First, the trim process reads up a bitmap node and puts it to the >> transaction (see try_capture() and friends). Then we go from left to > Hm, bitmap nodes are never captured by current code, so won't this effectively > be a no-op? Perhaps, my information about the transaction manages is outdated. We don't fuse atoms on common bitmap nodes? Excellent. Don't capture bitmaps then and skip the sorting phase. Just add a check that extents go in the right order when issuing discard requests. Edward. > >> right in the region of blocks covered by the bitmap node and handle >> the "free extents". In every such iteration we lock the bitmap node, >> locate the free extent, mark it dirty in the WORKING bitmap, put the >> respective entry to the deferred delete_set and unlock the bitmap node. > This can be done on allocator level by adding an "iterate free space" > interface method... > >> NOTE that commit of atoms spawned by the trim process will be unusual: >> no dirty jnodes, hence, no flush, no IOs (as RELOCATE and OVERWRITE >> sets are empty). Only issuing discard requests.. Of course, this is >> only in the case when other processes spawning dirty jnodes didn't >> join our atom (they can do it perfectly, as we lock bitmaps only per a >> free extent handling). > Again, bitmap nodes are not captured by current code. Is it critical that we > fuse with anything modifying the bitmaps? > > Thanks, ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-06-30 23:59 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-06-30 9:35 reiser4: discard implementation, part 2: FITRIM ioctl aka batch mode Ivan Shapovalov 2014-06-30 12:19 ` Edward Shishkin 2014-06-30 13:24 ` Ivan Shapovalov 2014-06-30 23:59 ` Edward Shishkin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).