public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: Jan Kara <jack@suse.cz>, Mingming Cao <cmm@us.ibm.com>
Cc: cmm@us.ibm.com, linux-ext4@vger.kernel.org
Subject: Re: [PATCH] ext4: Fix delalloc sync hang with journal lock inversion
Date: Wed, 11 Jun 2008 19:26:31 +0530	[thread overview]
Message-ID: <20080611135631.GA15169@skywalker> (raw)
In-Reply-To: <20080611124157.GB8121@duck.suse.cz>

On Wed, Jun 11, 2008 at 02:41:57PM +0200, Jan Kara wrote:
> On Fri 06-06-08 00:49:09, Aneesh Kumar K.V wrote:
> > On Thu, Jun 05, 2008 at 06:22:09PM +0200, Jan Kara wrote:
> > >   I like it. I'm only not sure whether there cannot be two users of
> > > write_cache_pages() operating on the same mapping at the same time. Because
> > > then they could alter writeback_index under each other and that would
> > > probably result in unpleasant behavior. I think there can be two parallel
> > > calls for example from sync_single_inode() and sync_page_range().
> > >   In that case we'd need something like writeback_index inside wbc (or
> > > maybe just alter range_start automatically when range_cont is set?) so that
> > > parallel callers do no influence each other.
> > > 
> > 
> > commit e56edfdeea0d336e496962782f08e1224a101cf2
> > Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> > Date:   Fri Jun 6 00:47:35 2008 +0530
> > 
> >     mm: Add range_cont mode for writeback.
> >     
> >     Filesystems like ext4 needs to start a new transaction in
> >     the writepages for block allocation. This happens with delayed
> >     allocation and there is limit to how many credits we can request
> >     from the journal layer. So we call write_cache_pages multiple
> >     times with wbc->nr_to_write set to the maximum possible value
> >     limitted by the max journal credits available.
> >     
> >     Add a new mode to writeback that enables us to handle this
> >     behaviour. If mapping->writeback_index is not set we use
> >     wbc->range_start to find the start index and then at the end
> >     of write_cache_pages we store the index in writeback_index. Next
> >     call to write_cache_pages will start writeout from writeback_index.
> >     Also we limit writing to the specified wbc->range_end.
>   I think this changelog is out of date...

The patch in the patchqueue have an updated changelog.


> 
> >     Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> > 
> > diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> > index f462439..0d8573e 100644
> > --- a/include/linux/writeback.h
> > +++ b/include/linux/writeback.h
> > @@ -63,6 +63,7 @@ struct writeback_control {
> >  	unsigned for_writepages:1;	/* This is a writepages() call */
> >  	unsigned range_cyclic:1;	/* range_start is cyclic */
> >  	unsigned more_io:1;		/* more io to be dispatched */
> > +	unsigned range_cont:1;
> >  };
> >  
> >  /*
> > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > index 789b6ad..182233b 100644
> > --- a/mm/page-writeback.c
> > +++ b/mm/page-writeback.c
> > @@ -882,6 +882,9 @@ int write_cache_pages(struct address_space *mapping,
> >  	if (wbc->range_cyclic) {
> >  		index = mapping->writeback_index; /* Start from prev offset */
> >  		end = -1;
> > +	} else if (wbc->range_cont) {
> > +		index = wbc->range_start >> PAGE_CACHE_SHIFT;
> > +		end = wbc->range_end >> PAGE_CACHE_SHIFT;
>   Hmm, why isn't this in the next else?

The patch in the patchqueue have

+       } else if (wbc->range_cont) {
+               index = wbc->range_start >> PAGE_CACHE_SHIFT;
+               end = wbc->range_end >> PAGE_CACHE_SHIFT;
+               /*
+                * we want to set the writeback_index when congested
+                * and we are requesting for nonblocking mode,
+                * because we won't force the range_cont mode then
+                */
+               if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX)
+                       range_whole = 1;


I was not clear about setting scanned = 1; Now that I read it again I
guess it makes sense to set scanned = 1. We don't need to start the
writeout from index=0 when range_cont is set.


> 
> >  	} else {
> >  		index = wbc->range_start >> PAGE_CACHE_SHIFT;
> >  		end = wbc->range_end >> PAGE_CACHE_SHIFT;
> > @@ -956,6 +959,9 @@ int write_cache_pages(struct address_space *mapping,
> >  	}
> >  	if (wbc->range_cyclic || (range_whole && wbc->nr_to_write > 0))
> >  		mapping->writeback_index = index;
> > +
> > +	if (wbc->range_cont)
> > +		wbc->range_start = index << PAGE_CACHE_SHIFT;
> >  	return ret;
> >  }
> >  EXPORT_SYMBOL(write_cache_pages);
> 
> 									Honza

Attaching the updated patch.

Mingming,

Can you update the patchqueu with the  below attached patch ?

-aneesh

mm: Add range_cont mode for writeback.

From: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Filesystems like ext4 needs to start a new transaction in
the writepages for block allocation. This happens with delayed
allocation and there is limit to how many credits we can request
from the journal layer. So we call write_cache_pages multiple
times with wbc->nr_to_write set to the maximum possible value
limitted by the max journal credits available.

Add a new mode to writeback that enables us to handle this
behaviour. In the new mode we update the wbc->range_start
to point to the new offset to be written. Next call to
call to write_cache_pages will start writeout from  specified
range_start offset. In the new mode we also limit writing
to the specified wbc->range_end.


Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---

 include/linux/writeback.h |    1 +
 mm/page-writeback.c       |    3 +++
 2 files changed, 4 insertions(+), 0 deletions(-)


diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index f462439..0d8573e 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -63,6 +63,7 @@ struct writeback_control {
 	unsigned for_writepages:1;	/* This is a writepages() call */
 	unsigned range_cyclic:1;	/* range_start is cyclic */
 	unsigned more_io:1;		/* more io to be dispatched */
+	unsigned range_cont:1;
 };
 
 /*
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 789b6ad..ded57d5 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -956,6 +956,9 @@ int write_cache_pages(struct address_space *mapping,
 	}
 	if (wbc->range_cyclic || (range_whole && wbc->nr_to_write > 0))
 		mapping->writeback_index = index;
+
+	if (wbc->range_cont)
+		wbc->range_start = index << PAGE_CACHE_SHIFT;
 	return ret;
 }
 EXPORT_SYMBOL(write_cache_pages);

  reply	other threads:[~2008-06-11 13:57 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-30 13:39 [PATCH -v2] delalloc and journal locking order inversion fixes Aneesh Kumar K.V
2008-05-30 13:39 ` [PATCH] ext4: Use page_mkwrite vma_operations to get mmap write notification Aneesh Kumar K.V
2008-05-30 13:39   ` [PATCH] ext4: Inverse locking order of page_lock and transaction start Aneesh Kumar K.V
2008-05-30 13:39     ` [PATCH] vfs: Move mark_inode_dirty() from under page lock in generic_write_end() Aneesh Kumar K.V
2008-05-30 13:39       ` [PATCH] ext4: Add validation to jbd lock inversion patch and split and writepage Aneesh Kumar K.V
2008-05-30 13:39         ` [PATCH] ext4: inverse locking ordering of page_lock and transaction start in delalloc Aneesh Kumar K.V
2008-05-30 13:39           ` [PATCH] ext4: Fix delalloc sync hang with journal lock inversion Aneesh Kumar K.V
2008-06-02  9:35             ` Jan Kara
2008-06-02  9:59               ` Aneesh Kumar K.V
2008-06-02 10:27                 ` Jan Kara
2008-06-05 13:54                   ` Aneesh Kumar K.V
2008-06-05 16:22                     ` Jan Kara
2008-06-05 19:19                       ` Aneesh Kumar K.V
2008-06-11 12:41                         ` Jan Kara
2008-06-11 13:56                           ` Aneesh Kumar K.V [this message]
2008-06-11 17:48                             ` Jan Kara
2008-06-12 23:10                             ` Mingming Cao
2008-06-02  9:31         ` [PATCH] ext4: Add validation to jbd lock inversion patch and split and writepage Jan Kara
2008-06-02  9:52           ` Aneesh Kumar K.V
2008-06-02 10:40             ` Jan Kara
2008-05-30 17:51 ` [PATCH -v2] delalloc and journal locking order inversion fixes Mingming
2008-06-01 21:10 ` [PATCH] ext4: Need clear buffer_delay after page writeout for delayed allocation Mingming Cao
2008-06-02  3:14   ` Aneesh Kumar K.V
2008-06-02  3:50     ` Mingming Cao
2008-06-02  4:09       ` Aneesh Kumar K.V
2008-06-02  5:38         ` Mingming Cao
2008-06-02  6:35           ` Aneesh Kumar K.V
2008-06-02  7:04             ` Mingming Cao
2008-06-02  8:05               ` Aneesh Kumar K.V
2008-06-03  4:43                 ` Mingming Cao
2008-06-03 10:07                   ` Aneesh Kumar K.V
  -- strict thread matches above, loose matches on Subject: below --
2008-06-06 18:24 Patches for the patchqueue Aneesh Kumar K.V
2008-06-06 18:24 ` [PATCH] ext4: cleanup blockallocator Aneesh Kumar K.V
2008-06-06 18:24   ` [PATCH] ext2: Use page_mkwrite vma_operations to get mmap write notification Aneesh Kumar K.V
2008-06-06 18:24     ` [PATCH] ext3: " Aneesh Kumar K.V
2008-06-06 18:24       ` [PATCH] vfs: Don't flush delay buffer to disk Aneesh Kumar K.V
2008-06-06 18:24         ` [PATCH] mm: Add range_cont mode for writeback Aneesh Kumar K.V
2008-06-06 18:24           ` [PATCH] ext4: Fix delalloc sync hang with journal lock inversion Aneesh Kumar K.V
2008-05-21 17:44 delalloc and journal locking order inversion fixes Aneesh Kumar K.V
2008-05-21 17:44 ` [PATCH] ext4: Add validation to jbd lock inversion patch and split and writepage Aneesh Kumar K.V
2008-05-21 17:44   ` [PATCH] ext4: inverse locking ordering of page_lock and transaction start in delalloc Aneesh Kumar K.V
2008-05-21 17:44     ` [PATCH] ext4: Fix delalloc sync hang with journal lock inversion Aneesh Kumar K.V
2008-05-22 10:25       ` Aneesh Kumar K.V
2008-05-22 17:58         ` Mingming
2008-05-22 18:23           ` Aneesh Kumar K.V
2008-05-22 19:45             ` Mingming
2008-05-22 18:10       ` Mingming
2008-05-22 18:26         ` Aneesh Kumar K.V
2008-05-22 19:26           ` Mingming

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080611135631.GA15169@skywalker \
    --to=aneesh.kumar@linux.vnet.ibm.com \
    --cc=cmm@us.ibm.com \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox