* Re: [PATCH] incorrect error handling inside generic_file_direct_write [not found] <87k60y1rq4.fsf@sw.ru> @ 2006-12-11 20:40 ` Andrew Morton 2006-12-12 9:22 ` Dmitriy Monakhov 2006-12-12 12:20 ` Dmitriy Monakhov 0 siblings, 2 replies; 12+ messages in thread From: Andrew Morton @ 2006-12-11 20:40 UTC (permalink / raw) To: Dmitriy Monakhov; +Cc: linux-kernel, Linux Memory Management, devel, xfs On Mon, 11 Dec 2006 16:34:27 +0300 Dmitriy Monakhov <dmonakhov@openvz.org> wrote: > OpenVZ team has discovered error inside generic_file_direct_write() > If generic_file_direct_IO() has fail (ENOSPC condition) it may have instantiated > a few blocks outside i_size. And fsck will complain about wrong i_size > (ext2, ext3 and reiserfs interpret i_size and biggest block difference as error), > after fsck will fix error i_size will be increased to the biggest block, > but this blocks contain gurbage from previous write attempt, this is not > information leak, but its silence file data corruption. > We need truncate any block beyond i_size after write have failed , do in simular > generic_file_buffered_write() error path. > > Exampe: > open("mnt2/FILE3", O_WRONLY|O_CREAT|O_DIRECT, 0666) = 3 > write(3, "aaaaaa"..., 4096) = -1 ENOSPC (No space left on device) > > stat mnt2/FILE3 > File: `mnt2/FILE3' > Size: 0 Blocks: 4 IO Block: 4096 regular empty file > >>>>>>>>>>>>>>>>>>>>>>^^^^^^^^^^ file size is less than biggest block idx > Device: 700h/1792d Inode: 14 Links: 1 > Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) > > fsck.ext2 -f -n mnt1/fs_img > Pass 1: Checking inodes, blocks, and sizes > Inode 14, i_size is 0, should be 2048. Fix? no > > Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org> > ---------- > > diff --git a/mm/filemap.c b/mm/filemap.c > index 7b84dc8..bf7cf6c 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -2041,6 +2041,14 @@ generic_file_direct_write(struct kiocb * > mark_inode_dirty(inode); > } > *ppos = end; > + } else if (written < 0) { > + loff_t isize = i_size_read(inode); > + /* > + * generic_file_direct_IO() may have instantiated a few blocks > + * outside i_size. Trim these off again. > + */ > + if (pos + count > isize) > + vmtruncate(inode, isize); > } > XFS (at least) can call generic_file_direct_write() with i_mutex not held. And vmtruncate() expects i_mutex to be held. I guess a suitable solution would be to push this problem back up to the callers: let them decide whether to run vmtruncate() and if so, to ensure that i_mutex is held. The existence of generic_file_aio_write_nolock() makes that rather messy though. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] incorrect error handling inside generic_file_direct_write 2006-12-11 20:40 ` [PATCH] incorrect error handling inside generic_file_direct_write Andrew Morton @ 2006-12-12 9:22 ` Dmitriy Monakhov 2006-12-12 6:36 ` Andrew Morton 2006-12-12 12:20 ` Dmitriy Monakhov 1 sibling, 1 reply; 12+ messages in thread From: Dmitriy Monakhov @ 2006-12-12 9:22 UTC (permalink / raw) To: Andrew Morton Cc: Dmitriy Monakhov, linux-kernel, Linux Memory Management, devel, xfs Andrew Morton <akpm@osdl.org> writes: > On Mon, 11 Dec 2006 16:34:27 +0300 > Dmitriy Monakhov <dmonakhov@openvz.org> wrote: > >> OpenVZ team has discovered error inside generic_file_direct_write() >> If generic_file_direct_IO() has fail (ENOSPC condition) it may have instantiated >> a few blocks outside i_size. And fsck will complain about wrong i_size >> (ext2, ext3 and reiserfs interpret i_size and biggest block difference as error), >> after fsck will fix error i_size will be increased to the biggest block, >> but this blocks contain gurbage from previous write attempt, this is not >> information leak, but its silence file data corruption. >> We need truncate any block beyond i_size after write have failed , do in simular >> generic_file_buffered_write() error path. >> >> Exampe: >> open("mnt2/FILE3", O_WRONLY|O_CREAT|O_DIRECT, 0666) = 3 >> write(3, "aaaaaa"..., 4096) = -1 ENOSPC (No space left on device) >> >> stat mnt2/FILE3 >> File: `mnt2/FILE3' >> Size: 0 Blocks: 4 IO Block: 4096 regular empty file >> >>>>>>>>>>>>>>>>>>>>>>^^^^^^^^^^ file size is less than biggest block idx >> Device: 700h/1792d Inode: 14 Links: 1 >> Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) >> >> fsck.ext2 -f -n mnt1/fs_img >> Pass 1: Checking inodes, blocks, and sizes >> Inode 14, i_size is 0, should be 2048. Fix? no >> >> Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org> >> ---------- >> >> diff --git a/mm/filemap.c b/mm/filemap.c >> index 7b84dc8..bf7cf6c 100644 >> --- a/mm/filemap.c >> +++ b/mm/filemap.c >> @@ -2041,6 +2041,14 @@ generic_file_direct_write(struct kiocb * >> mark_inode_dirty(inode); >> } >> *ppos = end; >> + } else if (written < 0) { >> + loff_t isize = i_size_read(inode); >> + /* >> + * generic_file_direct_IO() may have instantiated a few blocks >> + * outside i_size. Trim these off again. >> + */ >> + if (pos + count > isize) >> + vmtruncate(inode, isize); >> } >> > > XFS (at least) can call generic_file_direct_write() with i_mutex not held. How could it be ? from mm/filemap.c:2046 generic_file_direct_write() comment right after place where i want to add vmtruncate() /* * Sync the fs metadata but not the minor inode changes and * of course not the data as we did direct DMA for the IO. * i_mutex is held, which protects generic_osync_inode() from * livelocking. */ > And vmtruncate() expects i_mutex to be held. generic_file_direct_IO must called under i_mutex too from mm/filemap.c:2388 /* * Called under i_mutex for writes to S_ISREG files. Returns -EIO if something * went wrong during pagecache shootdown. */ static ssize_t generic_file_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, This means XFS generic_file_direct_write() call generic_file_direct_IO() without i_mutex held too? > > I guess a suitable solution would be to push this problem back up to the > callers: let them decide whether to run vmtruncate() and if so, to ensure > that i_mutex is held. > > The existence of generic_file_aio_write_nolock() makes that rather messy > though. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] incorrect error handling inside generic_file_direct_write 2006-12-12 9:22 ` Dmitriy Monakhov @ 2006-12-12 6:36 ` Andrew Morton 0 siblings, 0 replies; 12+ messages in thread From: Andrew Morton @ 2006-12-12 6:36 UTC (permalink / raw) To: Dmitriy Monakhov Cc: Dmitriy Monakhov, linux-kernel, Linux Memory Management, devel, xfs On Tue, 12 Dec 2006 12:22:14 +0300 Dmitriy Monakhov <dmonakhov@sw.ru> wrote: > >> @@ -2041,6 +2041,14 @@ generic_file_direct_write(struct kiocb * > >> mark_inode_dirty(inode); > >> } > >> *ppos = end; > >> + } else if (written < 0) { > >> + loff_t isize = i_size_read(inode); > >> + /* > >> + * generic_file_direct_IO() may have instantiated a few blocks > >> + * outside i_size. Trim these off again. > >> + */ > >> + if (pos + count > isize) > >> + vmtruncate(inode, isize); > >> } > >> > > > > XFS (at least) can call generic_file_direct_write() with i_mutex not held. > How could it be ? > > from mm/filemap.c:2046 generic_file_direct_write() comment right after > place where i want to add vmtruncate() > /* > * Sync the fs metadata but not the minor inode changes and > * of course not the data as we did direct DMA for the IO. > * i_mutex is held, which protects generic_osync_inode() from > * livelocking. > */ > > > And vmtruncate() expects i_mutex to be held. > generic_file_direct_IO must called under i_mutex too > from mm/filemap.c:2388 > /* > * Called under i_mutex for writes to S_ISREG files. Returns -EIO if something > * went wrong during pagecache shootdown. > */ > static ssize_t > generic_file_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, yup, the comments are wrong. > This means XFS generic_file_direct_write() call generic_file_direct_IO() without > i_mutex held too? Think so. XFS uses blockdev_direct_IO_own_locking(). We'd need to check with the XFS guys regarding its precise operation and what needs to be done here. > > > > I guess a suitable solution would be to push this problem back up to the > > callers: let them decide whether to run vmtruncate() and if so, to ensure > > that i_mutex is held. > > > > The existence of generic_file_aio_write_nolock() makes that rather messy > > though. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] incorrect error handling inside generic_file_direct_write 2006-12-11 20:40 ` [PATCH] incorrect error handling inside generic_file_direct_write Andrew Morton 2006-12-12 9:22 ` Dmitriy Monakhov @ 2006-12-12 12:20 ` Dmitriy Monakhov 2006-12-12 9:52 ` Andrew Morton 1 sibling, 1 reply; 12+ messages in thread From: Dmitriy Monakhov @ 2006-12-12 12:20 UTC (permalink / raw) To: Andrew Morton Cc: Dmitriy Monakhov, linux-kernel, Linux Memory Management, devel, xfs Andrew Morton <akpm@osdl.org> writes: > On Mon, 11 Dec 2006 16:34:27 +0300 > Dmitriy Monakhov <dmonakhov@openvz.org> wrote: > >> OpenVZ team has discovered error inside generic_file_direct_write() >> If generic_file_direct_IO() has fail (ENOSPC condition) it may have instantiated >> a few blocks outside i_size. And fsck will complain about wrong i_size >> (ext2, ext3 and reiserfs interpret i_size and biggest block difference as error), >> after fsck will fix error i_size will be increased to the biggest block, >> but this blocks contain gurbage from previous write attempt, this is not >> information leak, but its silence file data corruption. >> We need truncate any block beyond i_size after write have failed , do in simular >> generic_file_buffered_write() error path. >> >> Exampe: >> open("mnt2/FILE3", O_WRONLY|O_CREAT|O_DIRECT, 0666) = 3 >> write(3, "aaaaaa"..., 4096) = -1 ENOSPC (No space left on device) >> >> stat mnt2/FILE3 >> File: `mnt2/FILE3' >> Size: 0 Blocks: 4 IO Block: 4096 regular empty file >> >>>>>>>>>>>>>>>>>>>>>>^^^^^^^^^^ file size is less than biggest block idx >> Device: 700h/1792d Inode: 14 Links: 1 >> Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) >> >> fsck.ext2 -f -n mnt1/fs_img >> Pass 1: Checking inodes, blocks, and sizes >> Inode 14, i_size is 0, should be 2048. Fix? no >> >> Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org> >> ---------- >> >> diff --git a/mm/filemap.c b/mm/filemap.c >> index 7b84dc8..bf7cf6c 100644 >> --- a/mm/filemap.c >> +++ b/mm/filemap.c >> @@ -2041,6 +2041,14 @@ generic_file_direct_write(struct kiocb * >> mark_inode_dirty(inode); >> } >> *ppos = end; >> + } else if (written < 0) { >> + loff_t isize = i_size_read(inode); >> + /* >> + * generic_file_direct_IO() may have instantiated a few blocks >> + * outside i_size. Trim these off again. >> + */ >> + if (pos + count > isize) >> + vmtruncate(inode, isize); >> } >> > > XFS (at least) can call generic_file_direct_write() with i_mutex not held. > And vmtruncate() expects i_mutex to be held. > > I guess a suitable solution would be to push this problem back up to the > callers: let them decide whether to run vmtruncate() and if so, to ensure > that i_mutex is held. > > The existence of generic_file_aio_write_nolock() makes that rather messy > though. This means we may call generic_file_aio_write_nolock() without i_mutex, right? but call trace is : generic_file_aio_write_nolock() ->generic_file_buffered_write() /* i_mutex not held here */ but according to filemaps locking rules: mm/filemap.c:77 .. * ->i_mutex (generic_file_buffered_write) * ->mmap_sem (fault_in_pages_readable->do_page_fault) .. I'm confused a litle bit, where is the truth? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] incorrect error handling inside generic_file_direct_write 2006-12-12 12:20 ` Dmitriy Monakhov @ 2006-12-12 9:52 ` Andrew Morton 2006-12-12 13:18 ` Dmitriy Monakhov 0 siblings, 1 reply; 12+ messages in thread From: Andrew Morton @ 2006-12-12 9:52 UTC (permalink / raw) To: Dmitriy Monakhov Cc: Dmitriy Monakhov, linux-kernel, Linux Memory Management, devel, xfs On Tue, 12 Dec 2006 15:20:52 +0300 Dmitriy Monakhov <dmonakhov@sw.ru> wrote: > > XFS (at least) can call generic_file_direct_write() with i_mutex not held. > > And vmtruncate() expects i_mutex to be held. > > > > I guess a suitable solution would be to push this problem back up to the > > callers: let them decide whether to run vmtruncate() and if so, to ensure > > that i_mutex is held. > > > > The existence of generic_file_aio_write_nolock() makes that rather messy > > though. > This means we may call generic_file_aio_write_nolock() without i_mutex, right? > but call trace is : > generic_file_aio_write_nolock() > ->generic_file_buffered_write() /* i_mutex not held here */ > but according to filemaps locking rules: mm/filemap.c:77 > .. > * ->i_mutex (generic_file_buffered_write) > * ->mmap_sem (fault_in_pages_readable->do_page_fault) > .. > I'm confused a litle bit, where is the truth? xfs_write() calls generic_file_direct_write() without taking i_mutex for O_DIRECT writes. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] incorrect error handling inside generic_file_direct_write 2006-12-12 9:52 ` Andrew Morton @ 2006-12-12 13:18 ` Dmitriy Monakhov 2006-12-12 10:40 ` Andrew Morton 0 siblings, 1 reply; 12+ messages in thread From: Dmitriy Monakhov @ 2006-12-12 13:18 UTC (permalink / raw) To: Andrew Morton Cc: Dmitriy Monakhov, Dmitriy Monakhov, linux-kernel, Linux Memory Management, devel, xfs [-- Attachment #1: Type: text/plain, Size: 1433 bytes --] Andrew Morton <akpm@osdl.org> writes: > On Tue, 12 Dec 2006 15:20:52 +0300 > Dmitriy Monakhov <dmonakhov@sw.ru> wrote: > >> > XFS (at least) can call generic_file_direct_write() with i_mutex not held. >> > And vmtruncate() expects i_mutex to be held. >> > >> > I guess a suitable solution would be to push this problem back up to the >> > callers: let them decide whether to run vmtruncate() and if so, to ensure >> > that i_mutex is held. >> > >> > The existence of generic_file_aio_write_nolock() makes that rather messy >> > though. >> This means we may call generic_file_aio_write_nolock() without i_mutex, right? >> but call trace is : >> generic_file_aio_write_nolock() >> ->generic_file_buffered_write() /* i_mutex not held here */ >> but according to filemaps locking rules: mm/filemap.c:77 >> .. >> * ->i_mutex (generic_file_buffered_write) >> * ->mmap_sem (fault_in_pages_readable->do_page_fault) >> .. >> I'm confused a litle bit, where is the truth? > > xfs_write() calls generic_file_direct_write() without taking i_mutex for > O_DIRECT writes. Yes, but my quastion is about __generic_file_aio_write_nolock(). As i understand _nolock sufix means that i_mutex was already locked by caller, am i right ? If yes, than __generic_file_aio_write_nolock() is beter place for vmtrancate() acclivity after generic_file_direct_write() has fail. Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org> ------- [-- Attachment #2: diff-generic-direct-io-write-fix --] [-- Type: text/plain, Size: 587 bytes --] diff --git a/mm/filemap.c b/mm/filemap.c index 7b84dc8..723d2ca 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2282,6 +2282,15 @@ __generic_file_aio_write_nolock(struct k written = generic_file_direct_write(iocb, iov, &nr_segs, pos, ppos, count, ocount); + if (written < 0) { + loff_t isize = i_size_read(inode); + /* + * generic_file_direct_write() may have instantiated + * a few blocks outside i_size. Trim these off again. + */ + if (pos + count > isize) + vmtruncate(inode, isize); + } if (written < 0 || written == count) goto out; /* ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] incorrect error handling inside generic_file_direct_write 2006-12-12 13:18 ` Dmitriy Monakhov @ 2006-12-12 10:40 ` Andrew Morton 2006-12-12 23:14 ` Dmitriy Monakhov 2006-12-13 2:43 ` Chen, Kenneth W 0 siblings, 2 replies; 12+ messages in thread From: Andrew Morton @ 2006-12-12 10:40 UTC (permalink / raw) To: Dmitriy Monakhov Cc: Dmitriy Monakhov, linux-kernel, Linux Memory Management, devel, xfs On Tue, 12 Dec 2006 16:18:32 +0300 Dmitriy Monakhov <dmonakhov@sw.ru> wrote: > >> but according to filemaps locking rules: mm/filemap.c:77 > >> .. > >> * ->i_mutex (generic_file_buffered_write) > >> * ->mmap_sem (fault_in_pages_readable->do_page_fault) > >> .. > >> I'm confused a litle bit, where is the truth? > > > > xfs_write() calls generic_file_direct_write() without taking i_mutex for > > O_DIRECT writes. > Yes, but my quastion is about __generic_file_aio_write_nolock(). > As i understand _nolock sufix means that i_mutex was already locked > by caller, am i right ? Nope. It just means that __generic_file_aio_write_nolock() doesn't take the lock. We don't assume or require that the caller took it. For example the raw driver calls generic_file_aio_write_nolock() without taking i_mutex. Raw isn't relevant to the problem (although ocfs2 might be). But we cannot assume that all callers have taken i_mutex, I think. I guess we can make that a rule (document it, add BUG_ON(!mutex_is_locked(..)) if it isn't a blockdev) if needs be. After really checking that this matches reality for all callers. It's important, too - if we have an unprotected i_size_write() then the seqlock can get out of sync due to a race and then i_size_read() locks up the kernel. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] incorrect error handling inside generic_file_direct_write 2006-12-12 10:40 ` Andrew Morton @ 2006-12-12 23:14 ` Dmitriy Monakhov 2006-12-13 2:43 ` Chen, Kenneth W 1 sibling, 0 replies; 12+ messages in thread From: Dmitriy Monakhov @ 2006-12-12 23:14 UTC (permalink / raw) To: Andrew Morton Cc: Dmitriy Monakhov, linux-kernel, Linux Memory Management, devel, xfs [-- Attachment #1: Type: text/plain, Size: 1535 bytes --] Andrew Morton <akpm@osdl.org> writes: > On Tue, 12 Dec 2006 16:18:32 +0300 > Dmitriy Monakhov <dmonakhov@sw.ru> wrote: > >> >> but according to filemaps locking rules: mm/filemap.c:77 >> >> .. >> >> * ->i_mutex (generic_file_buffered_write) >> >> * ->mmap_sem (fault_in_pages_readable->do_page_fault) >> >> .. >> >> I'm confused a litle bit, where is the truth? >> > >> > xfs_write() calls generic_file_direct_write() without taking i_mutex for >> > O_DIRECT writes. >> Yes, but my quastion is about __generic_file_aio_write_nolock(). >> As i understand _nolock sufix means that i_mutex was already locked >> by caller, am i right ? > > Nope. It just means that __generic_file_aio_write_nolock() doesn't take > the lock. We don't assume or require that the caller took it. For example > the raw driver calls generic_file_aio_write_nolock() without taking > i_mutex. Raw isn't relevant to the problem (although ocfs2 might be). But > we cannot assume that all callers have taken i_mutex, I think. > > I guess we can make that a rule (document it, add > BUG_ON(!mutex_is_locked(..)) if it isn't a blockdev) if needs be. After > really checking that this matches reality for all callers. I've checked generic_file_aio_write_nolock() callers for non blockdev. Only ocfs2 call it explicitly, and do it under i_mutex. So we need to do: 1) Change wrong comments. 2) Add BUG_ON(!mutex_is_locked(..)) for non blkdev. 3) Invoke vmtruncate only for non blkdev. Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org> ------- [-- Attachment #2: direct-io-fix.patch --] [-- Type: text/plain, Size: 2123 bytes --] diff --git a/mm/filemap.c b/mm/filemap.c index 7b84dc8..540ef5e 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2046,8 +2046,8 @@ generic_file_direct_write(struct kiocb * /* * Sync the fs metadata but not the minor inode changes and * of course not the data as we did direct DMA for the IO. - * i_mutex is held, which protects generic_osync_inode() from - * livelocking. + * i_mutex may not being held, if so some specific locking + * ordering must protect generic_osync_inode() from livelocking. */ if (written >= 0 && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) { int err = generic_osync_inode(inode, mapping, OSYNC_METADATA); @@ -2282,6 +2282,17 @@ __generic_file_aio_write_nolock(struct k written = generic_file_direct_write(iocb, iov, &nr_segs, pos, ppos, count, ocount); + /* + * If host is not S_ISBLK generic_file_direct_write() may + * have instantiated a few blocks outside i_size files + * Trim these off again. + */ + if (unlikely(written < 0) && !S_ISBLK(inode->i_mode)) { + loff_t isize = i_size_read(inode); + if (pos + count > isize) + vmtruncate(inode, isize); + } + if (written < 0 || written == count) goto out; /* @@ -2344,6 +2355,13 @@ ssize_t generic_file_aio_write_nolock(st ssize_t ret; BUG_ON(iocb->ki_pos != pos); + /* + * generic_file_buffered_write() may be called inside + * __generic_file_aio_write_nolock() even in case of + * O_DIRECT for non S_ISBLK files. So i_mutex must be held. + */ + if (!S_ISBLK(inode->i_mode)) + BUG_ON(!mutex_is_locked(&inode->i_mutex)); ret = __generic_file_aio_write_nolock(iocb, iov, nr_segs, &iocb->ki_pos); @@ -2386,8 +2404,8 @@ ssize_t generic_file_aio_write(struct ki EXPORT_SYMBOL(generic_file_aio_write); /* - * Called under i_mutex for writes to S_ISREG files. Returns -EIO if something - * went wrong during pagecache shootdown. + * May be called without i_mutex for writes to S_ISREG files. + * Returns -EIO if something went wrong during pagecache shootdown. */ static ssize_t generic_file_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, ^ permalink raw reply related [flat|nested] 12+ messages in thread
* RE: [PATCH] incorrect error handling inside generic_file_direct_write 2006-12-12 10:40 ` Andrew Morton 2006-12-12 23:14 ` Dmitriy Monakhov @ 2006-12-13 2:43 ` Chen, Kenneth W 2006-12-15 10:43 ` 'Christoph Hellwig' 1 sibling, 1 reply; 12+ messages in thread From: Chen, Kenneth W @ 2006-12-13 2:43 UTC (permalink / raw) To: 'Andrew Morton', Dmitriy Monakhov, 'Christoph Hellwig' Cc: Dmitriy Monakhov, linux-kernel, Linux Memory Management, devel, xfs Andrew Morton wrote on Tuesday, December 12, 2006 2:40 AM > On Tue, 12 Dec 2006 16:18:32 +0300 > Dmitriy Monakhov <dmonakhov@sw.ru> wrote: > > > >> but according to filemaps locking rules: mm/filemap.c:77 > > >> .. > > >> * ->i_mutex (generic_file_buffered_write) > > >> * ->mmap_sem (fault_in_pages_readable->do_page_fault) > > >> .. > > >> I'm confused a litle bit, where is the truth? > > > > > > xfs_write() calls generic_file_direct_write() without taking i_mutex for > > > O_DIRECT writes. > > Yes, but my quastion is about __generic_file_aio_write_nolock(). > > As i understand _nolock sufix means that i_mutex was already locked > > by caller, am i right ? > > Nope. It just means that __generic_file_aio_write_nolock() doesn't take > the lock. We don't assume or require that the caller took it. For example > the raw driver calls generic_file_aio_write_nolock() without taking > i_mutex. Raw isn't relevant to the problem (although ocfs2 might be). But > we cannot assume that all callers have taken i_mutex, I think. I think we should also clean up generic_file_aio_write_nolock. This was brought up a couple of weeks ago and I gave up too early. Here is my second attempt. How about the following patch, I think we can kill generic_file_aio_write_nolock and merge both *file_aio_write_nolock into one function, then generic_file_aio_write ocfs2_file_aio_write blk_dev->aio_write all points to a non-lock version of __generic_file_aio_write(). First two already hold i_mutex, while the block device's aio_write method doesn't require i_mutex to be held. Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> diff -Nurp linux-2.6.19/drivers/char/raw.c linux-2.6.19.ken/drivers/char/raw.c --- linux-2.6.19/drivers/char/raw.c 2006-11-29 13:57:37.000000000 -0800 +++ linux-2.6.19.ken/drivers/char/raw.c 2006-12-12 16:41:39.000000000 -0800 @@ -242,7 +242,7 @@ static const struct file_operations raw_ .read = do_sync_read, .aio_read = generic_file_aio_read, .write = do_sync_write, - .aio_write = generic_file_aio_write_nolock, + .aio_write = __generic_file_aio_write, .open = raw_open, .release= raw_release, .ioctl = raw_ioctl, diff -Nurp linux-2.6.19/fs/block_dev.c linux-2.6.19.ken/fs/block_dev.c --- linux-2.6.19/fs/block_dev.c 2006-11-29 13:57:37.000000000 -0800 +++ linux-2.6.19.ken/fs/block_dev.c 2006-12-12 16:47:58.000000000 -0800 @@ -1198,7 +1198,7 @@ const struct file_operations def_blk_fop .read = do_sync_read, .write = do_sync_write, .aio_read = generic_file_aio_read, - .aio_write = generic_file_aio_write_nolock, + .aio_write = __generic_file_aio_write, .mmap = generic_file_mmap, .fsync = block_fsync, .unlocked_ioctl = block_ioctl, diff -Nurp linux-2.6.19/fs/ocfs2/file.c linux-2.6.19.ken/fs/ocfs2/file.c --- linux-2.6.19/fs/ocfs2/file.c 2006-11-29 13:57:37.000000000 -0800 +++ linux-2.6.19.ken/fs/ocfs2/file.c 2006-12-12 16:42:09.000000000 -0800 @@ -1107,7 +1107,7 @@ static ssize_t ocfs2_file_aio_write(stru /* communicate with ocfs2_dio_end_io */ ocfs2_iocb_set_rw_locked(iocb); - ret = generic_file_aio_write_nolock(iocb, iov, nr_segs, iocb->ki_pos); + ret = __generic_file_aio_write(iocb, iov, nr_segs, iocb->ki_pos); /* buffered aio wouldn't have proper lock coverage today */ BUG_ON(ret == -EIOCBQUEUED && !(filp->f_flags & O_DIRECT)); diff -Nurp linux-2.6.19/include/linux/fs.h linux-2.6.19.ken/include/linux/fs.h --- linux-2.6.19/include/linux/fs.h 2006-11-29 13:57:37.000000000 -0800 +++ linux-2.6.19.ken/include/linux/fs.h 2006-12-12 16:41:58.000000000 -0800 @@ -1742,7 +1742,7 @@ extern int file_send_actor(read_descript int generic_write_checks(struct file *file, loff_t *pos, size_t *count, int isblk); extern ssize_t generic_file_aio_read(struct kiocb *, const struct iovec *, unsigned long, loff_t); extern ssize_t generic_file_aio_write(struct kiocb *, const struct iovec *, unsigned long, loff_t); -extern ssize_t generic_file_aio_write_nolock(struct kiocb *, const struct iovec *, +extern ssize_t __generic_file_aio_write(struct kiocb *, const struct iovec *, unsigned long, loff_t); extern ssize_t generic_file_direct_write(struct kiocb *, const struct iovec *, unsigned long *, loff_t, loff_t *, size_t, size_t); diff -Nurp linux-2.6.19/mm/filemap.c linux-2.6.19.ken/mm/filemap.c --- linux-2.6.19/mm/filemap.c 2006-11-29 13:57:37.000000000 -0800 +++ linux-2.6.19.ken/mm/filemap.c 2006-12-12 16:47:58.000000000 -0800 @@ -2219,9 +2219,9 @@ zero_length_segment: } EXPORT_SYMBOL(generic_file_buffered_write); -static ssize_t -__generic_file_aio_write_nolock(struct kiocb *iocb, const struct iovec *iov, - unsigned long nr_segs, loff_t *ppos) +ssize_t +__generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov, + unsigned long nr_segs, loff_t pos) { struct file *file = iocb->ki_filp; struct address_space * mapping = file->f_mapping; @@ -2229,9 +2229,10 @@ __generic_file_aio_write_nolock(struct k size_t count; /* after file limit checks */ struct inode *inode = mapping->host; unsigned long seg; - loff_t pos; + loff_t *ppos = &iocb->ki_pos; ssize_t written; ssize_t err; + ssize_t ret; ocount = 0; for (seg = 0; seg < nr_segs; seg++) { @@ -2254,7 +2255,6 @@ __generic_file_aio_write_nolock(struct k } count = ocount; - pos = *ppos; vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE); @@ -2332,32 +2332,16 @@ __generic_file_aio_write_nolock(struct k } out: current->backing_dev_info = NULL; - return written ? written : err; -} - -ssize_t generic_file_aio_write_nolock(struct kiocb *iocb, - const struct iovec *iov, unsigned long nr_segs, loff_t pos) -{ - struct file *file = iocb->ki_filp; - struct address_space *mapping = file->f_mapping; - struct inode *inode = mapping->host; - ssize_t ret; - - BUG_ON(iocb->ki_pos != pos); - - ret = __generic_file_aio_write_nolock(iocb, iov, nr_segs, - &iocb->ki_pos); + ret = written ? written : err; if (ret > 0 && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) { - ssize_t err; - err = sync_page_range_nolock(inode, mapping, pos, ret); if (err < 0) ret = err; } return ret; } -EXPORT_SYMBOL(generic_file_aio_write_nolock); +EXPORT_SYMBOL(__generic_file_aio_write); ssize_t generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov, unsigned long nr_segs, loff_t pos) @@ -2370,8 +2354,7 @@ ssize_t generic_file_aio_write(struct ki BUG_ON(iocb->ki_pos != pos); mutex_lock(&inode->i_mutex); - ret = __generic_file_aio_write_nolock(iocb, iov, nr_segs, - &iocb->ki_pos); + ret = __generic_file_aio_write(iocb, iov, nr_segs, pos); mutex_unlock(&inode->i_mutex); if (ret > 0 && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) { ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] incorrect error handling inside generic_file_direct_write 2006-12-13 2:43 ` Chen, Kenneth W @ 2006-12-15 10:43 ` 'Christoph Hellwig' 2006-12-15 18:53 ` Chen, Kenneth W 0 siblings, 1 reply; 12+ messages in thread From: 'Christoph Hellwig' @ 2006-12-15 10:43 UTC (permalink / raw) To: Chen, Kenneth W Cc: 'Andrew Morton', Dmitriy Monakhov, 'Christoph Hellwig', Dmitriy Monakhov, linux-kernel, Linux Memory Management, devel, xfs > +ssize_t > +__generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov, > + unsigned long nr_segs, loff_t pos) I'd still call this generic_file_aio_write_nolock. > + loff_t *ppos = &iocb->ki_pos; I'd rather use iocb->ki_pos directly in the few places ppos is referenced currently. > if (ret > 0 && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) { > - ssize_t err; > - > err = sync_page_range_nolock(inode, mapping, pos, ret); > if (err < 0) > ret = err; > } So we're doing the sync_page_range once in __generic_file_aio_write with i_mutex held. > mutex_lock(&inode->i_mutex); > - ret = __generic_file_aio_write_nolock(iocb, iov, nr_segs, > - &iocb->ki_pos); > + ret = __generic_file_aio_write(iocb, iov, nr_segs, pos); > mutex_unlock(&inode->i_mutex); > > if (ret > 0 && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) { And then another time after it's unlocked, this seems wrong. ^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [PATCH] incorrect error handling inside generic_file_direct_write 2006-12-15 10:43 ` 'Christoph Hellwig' @ 2006-12-15 18:53 ` Chen, Kenneth W 2007-01-02 11:17 ` 'Christoph Hellwig' 0 siblings, 1 reply; 12+ messages in thread From: Chen, Kenneth W @ 2006-12-15 18:53 UTC (permalink / raw) To: 'Christoph Hellwig' Cc: 'Andrew Morton', Dmitriy Monakhov, Dmitriy Monakhov, linux-kernel, Linux Memory Management, devel, xfs Christoph Hellwig wrote on Friday, December 15, 2006 2:44 AM > So we're doing the sync_page_range once in __generic_file_aio_write > with i_mutex held. > > > > mutex_lock(&inode->i_mutex); > > - ret = __generic_file_aio_write_nolock(iocb, iov, nr_segs, > > - &iocb->ki_pos); > > + ret = __generic_file_aio_write(iocb, iov, nr_segs, pos); > > mutex_unlock(&inode->i_mutex); > > > > if (ret > 0 && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) { > > And then another time after it's unlocked, this seems wrong. I didn't invent that mess though. I should've ask the question first: in 2.6.20-rc1, generic_file_aio_write will call sync_page_range twice, once from __generic_file_aio_write_nolock and once within the function itself. Is it redundant? Can we delete the one in the top level function? Like the following? --- ./mm/filemap.c.orig 2006-12-15 09:02:58.000000000 -0800 +++ ./mm/filemap.c 2006-12-15 09:03:19.000000000 -0800 @@ -2370,14 +2370,6 @@ ssize_t generic_file_aio_write(struct ki ret = __generic_file_aio_write_nolock(iocb, iov, nr_segs, &iocb->ki_pos); mutex_unlock(&inode->i_mutex); - - if (ret > 0 && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) { - ssize_t err; - - err = sync_page_range(inode, mapping, pos, ret); - if (err < 0) - ret = err; - } return ret; } EXPORT_SYMBOL(generic_file_aio_write); ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] incorrect error handling inside generic_file_direct_write 2006-12-15 18:53 ` Chen, Kenneth W @ 2007-01-02 11:17 ` 'Christoph Hellwig' 0 siblings, 0 replies; 12+ messages in thread From: 'Christoph Hellwig' @ 2007-01-02 11:17 UTC (permalink / raw) To: Chen, Kenneth W Cc: 'Christoph Hellwig', 'Andrew Morton', Dmitriy Monakhov, Dmitriy Monakhov, linux-kernel, Linux Memory Management, devel, xfs On Fri, Dec 15, 2006 at 10:53:18AM -0800, Chen, Kenneth W wrote: > Christoph Hellwig wrote on Friday, December 15, 2006 2:44 AM > > So we're doing the sync_page_range once in __generic_file_aio_write > > with i_mutex held. > > > > > > > mutex_lock(&inode->i_mutex); > > > - ret = __generic_file_aio_write_nolock(iocb, iov, nr_segs, > > > - &iocb->ki_pos); > > > + ret = __generic_file_aio_write(iocb, iov, nr_segs, pos); > > > mutex_unlock(&inode->i_mutex); > > > > > > if (ret > 0 && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) { > > > > And then another time after it's unlocked, this seems wrong. > > > I didn't invent that mess though. > > I should've ask the question first: in 2.6.20-rc1, generic_file_aio_write > will call sync_page_range twice, once from __generic_file_aio_write_nolock > and once within the function itself. Is it redundant? Can we delete the > one in the top level function? Like the following? Really? I'm looking at -rc3 now as -rc1 is rather old and it's definitly not the case there. I also can't remember ever doing this - when I started the generic read/write path untangling I had exactly the same situation that's now in -rc3: - generic_file_aio_write_nolock calls sync_page_range_nolock - generic_file_aio_write calls sync_page_range - __generic_file_aio_write_nolock doesn't call any sync_page_range variant ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2007-01-02 11:36 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <87k60y1rq4.fsf@sw.ru>
2006-12-11 20:40 ` [PATCH] incorrect error handling inside generic_file_direct_write Andrew Morton
2006-12-12 9:22 ` Dmitriy Monakhov
2006-12-12 6:36 ` Andrew Morton
2006-12-12 12:20 ` Dmitriy Monakhov
2006-12-12 9:52 ` Andrew Morton
2006-12-12 13:18 ` Dmitriy Monakhov
2006-12-12 10:40 ` Andrew Morton
2006-12-12 23:14 ` Dmitriy Monakhov
2006-12-13 2:43 ` Chen, Kenneth W
2006-12-15 10:43 ` 'Christoph Hellwig'
2006-12-15 18:53 ` Chen, Kenneth W
2007-01-02 11:17 ` 'Christoph Hellwig'
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox