* [PATCH] Btrfs: shrink_delalloc check bdi write congested @ 2012-09-29 13:20 Itaru Kitayama 2012-09-29 16:21 ` David Sterba 0 siblings, 1 reply; 7+ messages in thread From: Itaru Kitayama @ 2012-09-29 13:20 UTC (permalink / raw) To: linux-btrfs In srhink_delalloc(), writeback starts if idle, also check the bdi is not write congested. The patch is against the head of the btrfs-next. Signed-off-by: Itaru Kitayama <kitayama@cl.bb4u.ne.jp> fs/btrfs/extent-tree.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index efb044e..caa74d3 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3712,7 +3712,7 @@ static void shrink_delalloc(struct btrfs_root *root, u64 t while (delalloc_bytes && loops < 3) { max_reclaim = min(delalloc_bytes, to_reclaim); nr_pages = max_reclaim >> PAGE_CACHE_SHIFT; - writeback_inodes_sb_nr_if_idle(root->fs_info->sb, nr_pages, + if (!bdi_write_congested(root->fs_info->sb->s_bdi)) writeback_in WB_REASON_FS_FREE_SPACE); /* ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] Btrfs: shrink_delalloc check bdi write congested 2012-09-29 13:20 [PATCH] Btrfs: shrink_delalloc check bdi write congested Itaru Kitayama @ 2012-09-29 16:21 ` David Sterba 2012-09-30 1:28 ` Itaru Kitayama 0 siblings, 1 reply; 7+ messages in thread From: David Sterba @ 2012-09-29 16:21 UTC (permalink / raw) To: Itaru Kitayama; +Cc: linux-btrfs On Sat, Sep 29, 2012 at 10:20:09PM +0900, Itaru Kitayama wrote: > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -3712,7 +3712,7 @@ static void shrink_delalloc(struct btrfs_root *root, u64 t > while (delalloc_bytes && loops < 3) { > max_reclaim = min(delalloc_bytes, to_reclaim); > nr_pages = max_reclaim >> PAGE_CACHE_SHIFT; > - writeback_inodes_sb_nr_if_idle(root->fs_info->sb, nr_pages, > + if (!bdi_write_congested(root->fs_info->sb->s_bdi)) writeback_in > WB_REASON_FS_FREE_SPACE); malformed patch ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Btrfs: shrink_delalloc check bdi write congested 2012-09-29 16:21 ` David Sterba @ 2012-09-30 1:28 ` Itaru Kitayama 2012-09-30 2:11 ` Itaru Kitayama 0 siblings, 1 reply; 7+ messages in thread From: Itaru Kitayama @ 2012-09-30 1:28 UTC (permalink / raw) To: linux-btrfs; +Cc: dave Resubmit this after the checkpatch test. In srhink_delalloc(), writeback starts if idle, also check the bdi is not write congested. The patch is against the head of the btrfs-next. Signed-off-by: Itaru Kitayama <kitayama@cl.bb4.ne.jp> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index efb044e..1aae046 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3712,8 +3712,9 @@ static void shrink_delalloc(struct btrfs_root *root, u64 to_reclaim, u64 orig, while (delalloc_bytes && loops < 3) { max_reclaim = min(delalloc_bytes, to_reclaim); nr_pages = max_reclaim >> PAGE_CACHE_SHIFT; - writeback_inodes_sb_nr_if_idle(root->fs_info->sb, nr_pages, - WB_REASON_FS_FREE_SPACE); + if (!bdi_write_congested(root->fs_info->sb->s_bdi)) + writeback_inodes_sb_nr_if_idle(root->fs_info, nr_page, + WB_REASON_FS_FREE_SPACE); /* * We need to wait for the async pages to actually start before On Sun, Sep 30, 2012 at 1:21 AM, David Sterba <dave@jikos.cz> wrote: > On Sat, Sep 29, 2012 at 10:20:09PM +0900, Itaru Kitayama wrote: >> --- a/fs/btrfs/extent-tree.c >> +++ b/fs/btrfs/extent-tree.c >> @@ -3712,7 +3712,7 @@ static void shrink_delalloc(struct btrfs_root *root, u64 t >> while (delalloc_bytes && loops < 3) { >> max_reclaim = min(delalloc_bytes, to_reclaim); >> nr_pages = max_reclaim >> PAGE_CACHE_SHIFT; >> - writeback_inodes_sb_nr_if_idle(root->fs_info->sb, nr_pages, >> + if (!bdi_write_congested(root->fs_info->sb->s_bdi)) writeback_in >> WB_REASON_FS_FREE_SPACE); > > malformed patch ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] Btrfs: shrink_delalloc check bdi write congested 2012-09-30 1:28 ` Itaru Kitayama @ 2012-09-30 2:11 ` Itaru Kitayama 2012-09-30 22:29 ` David Sterba 0 siblings, 1 reply; 7+ messages in thread From: Itaru Kitayama @ 2012-09-30 2:11 UTC (permalink / raw) To: linux-btrfs; +Cc: dave This is the correct one. Signed-off-by: Itaru Kitayama <kitayama@cl.bb4.ne.jp> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index efb044e..c032dbe 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3712,8 +3712,9 @@ static void shrink_delalloc(struct btrfs_root *root, u64 to_reclaim, u64 orig, while (delalloc_bytes && loops < 3) { max_reclaim = min(delalloc_bytes, to_reclaim); nr_pages = max_reclaim >> PAGE_CACHE_SHIFT; - writeback_inodes_sb_nr_if_idle(root->fs_info->sb, nr_pages, - WB_REASON_FS_FREE_SPACE); + if (!bdi_write_congested(root->fs_info->sb->s_bdi)) + writeback_inodes_sb_nr_if_idle(root->fs_info, nr_pages, + WB_REASON_FS_FREE_SPACE); /* * We need to wait for the async pages to actually start before On Sun, Sep 30, 2012 at 10:28 AM, Itaru Kitayama <kitayama@cl.bb4u.ne.jp> wrote: > Resubmit this after the checkpatch test. > > In srhink_delalloc(), writeback starts if idle, also check the bdi is > not write congested. > The patch is against the head of the btrfs-next. > > Signed-off-by: Itaru Kitayama <kitayama@cl.bb4.ne.jp> > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index efb044e..1aae046 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -3712,8 +3712,9 @@ static void shrink_delalloc(struct btrfs_root > *root, u64 to_reclaim, u64 orig, > while (delalloc_bytes && loops < 3) { > max_reclaim = min(delalloc_bytes, to_reclaim); > nr_pages = max_reclaim >> PAGE_CACHE_SHIFT; > - writeback_inodes_sb_nr_if_idle(root->fs_info->sb, nr_pages, > - WB_REASON_FS_FREE_SPACE); > + if (!bdi_write_congested(root->fs_info->sb->s_bdi)) > + writeback_inodes_sb_nr_if_idle(root->fs_info, nr_page, > + WB_REASON_FS_FREE_SPACE); > > /* > * We need to wait for the async pages to actually start before > > > On Sun, Sep 30, 2012 at 1:21 AM, David Sterba <dave@jikos.cz> wrote: >> On Sat, Sep 29, 2012 at 10:20:09PM +0900, Itaru Kitayama wrote: >>> --- a/fs/btrfs/extent-tree.c >>> +++ b/fs/btrfs/extent-tree.c >>> @@ -3712,7 +3712,7 @@ static void shrink_delalloc(struct btrfs_root *root, u64 t >>> while (delalloc_bytes && loops < 3) { >>> max_reclaim = min(delalloc_bytes, to_reclaim); >>> nr_pages = max_reclaim >> PAGE_CACHE_SHIFT; >>> - writeback_inodes_sb_nr_if_idle(root->fs_info->sb, nr_pages, >>> + if (!bdi_write_congested(root->fs_info->sb->s_bdi)) writeback_in >>> WB_REASON_FS_FREE_SPACE); >> >> malformed patch ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] Btrfs: shrink_delalloc check bdi write congested 2012-09-30 2:11 ` Itaru Kitayama @ 2012-09-30 22:29 ` David Sterba 2012-09-30 23:55 ` Itaru Kitayama 0 siblings, 1 reply; 7+ messages in thread From: David Sterba @ 2012-09-30 22:29 UTC (permalink / raw) To: Itaru Kitayama; +Cc: linux-btrfs, dave Hi again, On Sun, Sep 30, 2012 at 11:11:10AM +0900, Itaru Kitayama wrote: > This is the correct one. > > Signed-off-by: Itaru Kitayama <kitayama@cl.bb4.ne.jp> > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index efb044e..c032dbe 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -3712,8 +3712,9 @@ static void shrink_delalloc(struct btrfs_root > *root, u64 to_reclaim, u64 orig, > while (delalloc_bytes && loops < 3) { > max_reclaim = min(delalloc_bytes, to_reclaim); > nr_pages = max_reclaim >> PAGE_CACHE_SHIFT; > - writeback_inodes_sb_nr_if_idle(root->fs_info->sb, nr_pages, > - WB_REASON_FS_FREE_SPACE); > + if (!bdi_write_congested(root->fs_info->sb->s_bdi)) > + writeback_inodes_sb_nr_if_idle(root->fs_info, nr_pages, > + WB_REASON_FS_FREE_SPACE); You don't pass the same argument to writeback_inodes_sb_nr_if_idle in the changed code, this would not compile (root->fs_info->sb), but it's a minor thing. I'm more interested in the background motivation of the change, it's clear that it tries to avoid writing data if the devices are congested, have you measured an improvement against original behaviour? writeback_inodes_sb_nr_if_idle checks if writeback is in progress and does not start if this is true. That way this will not hammer the device unnecessarily. Your code tries to avoid even more hammering of the device when the writes do not come from writeback. This may or may not be a good thing (and hard to guess without a few tests, or at least I don't see that). Shrink delalloc starts the writeback for a given number of pages and then hopes they'll be flushed so the reserved space can be reclaimed back. If the device is congested, this will not start the writeback and it would be very unlikely that total delalloc bytes shrinks. The rest of the function relies on asynchronous behaviour, it's even less clear what it would do without the writeback call. In the worst case it could block on 'wait_event' or at 'btrfs_wait_ordered_extents' in some case, though this is just more of a speculation. david ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Btrfs: shrink_delalloc check bdi write congested 2012-09-30 22:29 ` David Sterba @ 2012-09-30 23:55 ` Itaru Kitayama 2012-10-01 0:10 ` David Sterba 0 siblings, 1 reply; 7+ messages in thread From: Itaru Kitayama @ 2012-09-30 23:55 UTC (permalink / raw) To: dave; +Cc: linux-btrfs Hi David, Thank you for your comments. I wanted to fix a lockdep warning on a possible deadlock case encountered during the xfstests with a scratch space almost full. You are right I encountered the worst scenario you described below, I drop this patch and I'll look at btrfs_congested_fn more to examine the mechanisms implemented there are working as expected. Thanks, Itaru On Mon, Oct 1, 2012 at 7:29 AM, David Sterba <dave@jikos.cz> wrote: > Hi again, > > On Sun, Sep 30, 2012 at 11:11:10AM +0900, Itaru Kitayama wrote: >> This is the correct one. >> >> Signed-off-by: Itaru Kitayama <kitayama@cl.bb4.ne.jp> >> >> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c >> index efb044e..c032dbe 100644 >> --- a/fs/btrfs/extent-tree.c >> +++ b/fs/btrfs/extent-tree.c >> @@ -3712,8 +3712,9 @@ static void shrink_delalloc(struct btrfs_root >> *root, u64 to_reclaim, u64 orig, >> while (delalloc_bytes && loops < 3) { >> max_reclaim = min(delalloc_bytes, to_reclaim); >> nr_pages = max_reclaim >> PAGE_CACHE_SHIFT; >> - writeback_inodes_sb_nr_if_idle(root->fs_info->sb, nr_pages, >> - WB_REASON_FS_FREE_SPACE); >> + if (!bdi_write_congested(root->fs_info->sb->s_bdi)) >> + writeback_inodes_sb_nr_if_idle(root->fs_info, nr_pages, >> + WB_REASON_FS_FREE_SPACE); > > You don't pass the same argument to writeback_inodes_sb_nr_if_idle in > the changed code, this would not compile (root->fs_info->sb), but it's > a minor thing. > > I'm more interested in the background motivation of the change, it's > clear that it tries to avoid writing data if the devices are congested, > have you measured an improvement against original behaviour? > > writeback_inodes_sb_nr_if_idle checks if writeback is in progress and > does not start if this is true. That way this will not hammer the device > unnecessarily. > > Your code tries to avoid even more hammering of the device when the > writes do not come from writeback. This may or may not be a good thing > (and hard to guess without a few tests, or at least I don't see that). > > Shrink delalloc starts the writeback for a given number of pages and > then hopes they'll be flushed so the reserved space can be reclaimed > back. If the device is congested, this will not start the writeback and > it would be very unlikely that total delalloc bytes shrinks. The rest of > the function relies on asynchronous behaviour, it's even less clear what > it would do without the writeback call. In the worst case it could block > on 'wait_event' or at 'btrfs_wait_ordered_extents' in some case, though > this is just more of a speculation. > > > david ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Btrfs: shrink_delalloc check bdi write congested 2012-09-30 23:55 ` Itaru Kitayama @ 2012-10-01 0:10 ` David Sterba 0 siblings, 0 replies; 7+ messages in thread From: David Sterba @ 2012-10-01 0:10 UTC (permalink / raw) To: Itaru Kitayama; +Cc: dave, linux-btrfs On Mon, Oct 01, 2012 at 08:55:28AM +0900, Itaru Kitayama wrote: > Thank you for your comments. I wanted to fix a lockdep warning on a > possible deadlock > case encountered during the xfstests with a scratch space almost full. This is the known cleaner/writeback deadlock and avoiding writeback under device congestion may avoid it, but not in all cases and we want to fix it properly (Miao sent the patches, but they're not merged so far). > You are right I encountered the worst scenario you described below, I > drop this patch and > I'll look at btrfs_congested_fn more to examine the mechanisms > implemented there are > working as expected. I'm not sure what you want to achieve with the congestion checks (other than avoid the cleaner deadlock). david ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-10-01 0:10 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-09-29 13:20 [PATCH] Btrfs: shrink_delalloc check bdi write congested Itaru Kitayama 2012-09-29 16:21 ` David Sterba 2012-09-30 1:28 ` Itaru Kitayama 2012-09-30 2:11 ` Itaru Kitayama 2012-09-30 22:29 ` David Sterba 2012-09-30 23:55 ` Itaru Kitayama 2012-10-01 0:10 ` David Sterba
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).