All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dmitriy Monakhov <dmonakhov@sw.ru>
To: Nick Piggin <npiggin@suse.de>
Cc: linux-kernel@vger.kernel.org, mark.fasheh@oracle.com,
	linux-ext4@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: + fs-introduce-write_begin-write_end-and-perform_write-aops.patch added to -mm tree
Date: Wed, 13 Jun 2007 22:51:46 +0400	[thread overview]
Message-ID: <20070613185146.GF13815@localhost.sw.ru> (raw)
In-Reply-To: <20070613114356.GD17547@wotan.suse.de>

On 13:43 Срд 13 Июн     , Nick Piggin wrote:
> On Wed, Jun 13, 2007 at 05:40:05PM +0400, Dmitriy Monakhov wrote:
> > On 14:19 ?????? 29 ??????     , akpm@linux-foundation.org wrote:
> > > 
> > > The patch titled
> > >      fs: introduce write_begin, write_end, and perform_write aops
> > > has been added to the -mm tree.  Its filename is
> > >      fs-introduce-write_begin-write_end-and-perform_write-aops.patch
> > > 
> > > *** Remember to use Documentation/SubmitChecklist when testing your code ***
> > > 
> > > See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find
> > > out what to do about this
> > > 
> > > ------------------------------------------------------
> > > Subject: fs: introduce write_begin, write_end, and perform_write aops
> > > From: Nick Piggin <npiggin@suse.de>
> > > 
> > > These are intended to replace prepare_writ eand commit_write with more
> > > flexible alternatives that are also able to avoid the buffered write
> > > deadlock problems efficiently (which prepare_write is unable to do).
> > > 
> > > [mark.fasheh@oracle.com: API design contributions, code review and fixes]
> > > Signed-off-by: Nick Piggin <npiggin@suse.de>
> > > Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
> > > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> > I've finaly find time to review Nick's "write_begin/write_end aop" patch set.
> > And i have some fixes and questions. My be i've missed somthing and it was 
> > already disscussed, but i cant find in LKML.
> 
> Thanks, that's really helpful.
> 
>  
> > 1) loop dev:
> > 	loop.c code itself is not perfect. In fact before Nick's patch
> > 	partial write was't possible. Assumption what write chunks are
> > 	always page aligned is realy weird ( see "index++" line).
> > 	Fixed by "new aop loop fix" patch
> 
> I think you're right, fix looks good.
> 
>  
> > 2)block_write_begin:
> > 	After we enter to block_write_begin with *pagep == NULL and
> > 	some page was grabed we remember this page in *pagep
> > 	And if __block_prepare_write() we have to clear *pagep , as 
> > 	it was before. Because this may confuse caller.
> > 	for example caller may have folowing code:
> > 		ret = block_write_begin(..., pagep,...)
> > 		if (ret && *pagep != NULL) {
> > 			unlock_page(*pagep);
> > 			page_cache_release(*pagep);
> > 		}
> > 	Fixed my "new aop block_write_begin fix" patch
> 
> I don't think the caller can rely on that if it returns failure.
> But that is more defensive I guess. Maybe setting it to 1 or
> so would catch abusers.
> 
>  
> > 3) __page_symlink:
> > 	Nick's patch add folowing code:
> > 	+ err = pagecache_write_begin(NULL, mapping, 0,PAGE_CACHE_SIZE,
> > 	+                 AOP_FLAG_UNINTERRUPTIBLE, &page,&fsdata);
> > 	symlink now consume whole page. I have only one question "WHY???".
> > 	I don't see any advantages, but where are huge list of
> > 	dissdvantages:
> > 	a) fs with blksize == 1k and pagesize == 16k after this patch
> > 	   waste more than 10x times disk space for nothing.
> > 	b) What happends if we want use fs with blksize == 4k on i386
> > 	   after it was used by ia64 ??? (before this patch it was
> > 	   possible).
One more visiable effect caused by wrong symlink size:
fsstress cause folowing error:
<LOG BEGIN>
EXT3-fs unexpected failure: !buffer_revoked(bh); 
inconsistent data on disk
ext3_forget: aborting transaction: IO failure in __ext3_journal_revoke
ext3_abort called.

EXT3-fs error (device dm-4): ext3_forget: error -5 when attempting
revoke
Remounting filesystem read-only 
Aborting journal on device dm-4.
journal commit I/O error
journal commit I/O error
journal commit I/O error
journal commit I/O error
journal commit I/O error
journal commit I/O error
EXT3-fs error (device dm-4) in ext3_free_blocks_sb: Journal has aborted

journal commit I/O error
journal commit I/O error
EXT3-fs error (device dm-4) in ext3_reserve_inode_write: Journal has
aborted
EXT3-fs error (device dm-4) in ext3_truncate: IO failure
journal commit I/O error
journal commit I/O error
journal commit I/O error
journal commit I/O error
EXT3-fs error (device dm-4) in ext3_reserve_inode_write: Journal has
aborted
EXT3-fs error (device dm-4) in ext3_orphan_del: Journal has aborted
EXT3-fs error (device dm-4) in ext3_reserve_inode_write: Journal has
aborted
EXT3-fs error (device dm-4) in ext3_delete_inode: IO failure
<LOG END>

After symlink size was fixed to len-1 problem dissappeared. 
> > 	
> > 	I dont prepare patch for this because i dont understand issue
> > 	witch Nick aimed to fix.
> 
> I don't know why myself :P I think it would be just fine to use
> len-1 as it did previously, so it must have been a typo?
> 
> 
> > 4) iov_iter_fault_in_readable:
> > 	Function prerform check for signgle region, with out respect to
> > 	segment nature of iovec, For example writev no longer works :) :
> > 	writev(3, [{"\1", 1}, {"\2"..., 4096}], 2) = -1 EFAULT (Bad address)
> > 	this hidden bug, and it was invisiable because XXXX_fault_in_readable
> > 	return value was ignored before. Lets iov_iter_fault_in_readable
> > 	perform checks for all segments.
> > 	Fixed by :"iov_iter_fault_in_readable fix"
> 
> OK thanks. I would rather just change this to use the length of
> the first iovec always (prefaulting multiple iovecs would have to
> be benchmarked on real apps that make heavy use of writev, I
> suspect).
> 
>  
> > 5) ext3_write_end:
> > 	Before  write_begin/write_end patch set we have folowing locking
> > 	order:
> > 		stop_journal(handle);
> > 		unlock_page(page);
> > 	But now order is oposite:
> > 		unlock_page(page);
> > 		stop_journal(handle);
> > 	Can we got any race condition now? I'm not sure is it actual problem,
> > 	may be somebody cant describe this.
> 
> Can we just change it to the original order? That would seem to be
> safest unless one of the ext3 devs explicitly acks it.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

  reply	other threads:[~2007-06-13 15:51 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-29 21:19 + fs-introduce-write_begin-write_end-and-perform_write-aops.patch added to -mm tree akpm
2007-05-30  3:13 ` Nick Piggin
2007-05-30  3:24   ` Andrew Morton
2007-05-30 22:44     ` Mark Fasheh
2007-05-30 22:49       ` Andrew Morton
2007-05-30 10:39   ` Steven Whitehouse
2007-05-31  5:50     ` Nick Piggin
2007-06-13 13:40 ` Dmitriy Monakhov
2007-06-13 11:43   ` Nick Piggin
2007-06-13 18:51     ` Dmitriy Monakhov [this message]
2007-06-13 23:07     ` Badari Pulavarty
2007-06-13 23:28       ` Nick Piggin
2007-06-14  9:52       ` Jan Kara
2007-06-14 10:39         ` Nick Piggin
2007-06-13 13:50   ` [patch] new aop block_write_begin fix Dmitriy Monakhov
2007-06-13 13:57   ` iov_iter_fault_in_readable fix Dmitriy Monakhov
2007-06-14 17:31     ` Christoph Hellwig
2007-06-14 22:21       ` David Chinner
2007-06-14 22:34         ` Christoph Hellwig
2007-06-16 18:17       ` Dmitriy Monakhov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070613185146.GF13815@localhost.sw.ru \
    --to=dmonakhov@sw.ru \
    --cc=akpm@linux-foundation.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.fasheh@oracle.com \
    --cc=npiggin@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.