From: Jan Kara <jack@suse.cz>
To: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Cc: Jan Kara <jack@suse.cz>, Ted Ts'o <tytso@mit.edu>,
Masayoshi MIZUMA <m.mizuma@jp.fujitsu.com>,
Andreas Dilger <adilger.kernel@dilger.ca>,
linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
sandeen@redhat.com
Subject: Re: [RFC][PATCH] Re: [BUG] ext4: cannot unfreeze a filesystem due to a deadlock
Date: Sat, 23 Apr 2011 00:10:25 +0200 [thread overview]
Message-ID: <20110422221025.GF2977@quack.suse.cz> (raw)
In-Reply-To: <20110422155839.3295e8e8.toshi.okajima@jp.fujitsu.com>
On Fri 22-04-11 15:58:39, Toshiyuki Okajima wrote:
> I have confirmed that the following patch works fine while my or
> Mizuma-san's reproducer is running. Therefore,
> we can block to write the data, which is mmapped to a file, into a disk
> by a page-fault while fsfreezing.
>
> I think this patch fixes the following two problems:
> - A deadlock occurs between ext4_da_writepages() (called from
> writeback_inodes_wb) and thaw_super(). (reported by Mizuma-san)
> - We can also write the data, which is mmapped to a file,
> into a disk while fsfreezing (ext3/ext4).
> (reported by me)
>
> Please examine this patch.
Thanks for the patch. The ext3 part is not as easy as this. You cannot
really get i_alloc_sem in ext3_page_mkwrite() because mmap_sem is already
held by page fault code and i_alloc_sem should be acquired before it (yes I
know, ext4 already has this bug which should be fixed when I get to it).
Also you'll find that performance of random writers via mmap (which is
relatively common) is going to be rather bad with this patch (because the
file will be heavily fragmented). We have to be more clever which is
exactly why it's taking me so long with my patch :) But tests are already
running so if everything goes fine, I should have patches to submit next
week.
The ext4 part looks correct. I'd just also like to have some comments about
how freeze handling is done because it's kind of subtle.
Honza
> diff --git a/fs/ext3/file.c b/fs/ext3/file.c
> index f55df0e..6d376ef 100644
> --- a/fs/ext3/file.c
> +++ b/fs/ext3/file.c
> @@ -52,6 +52,23 @@ static int ext3_release_file (struct inode * inode, struct file * filp)
> return 0;
> }
>
> +static const struct vm_operations_struct ext3_file_vm_ops = {
> + .fault = filemap_fault,
> + .page_mkwrite = ext3_page_mkwrite,
> +};
> +
> +static int ext3_file_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> + struct address_space *mapping = file->f_mapping;
> +
> + if (!mapping->a_ops->readpage)
> + return -ENOEXEC;
> + file_accessed(file);
> + vma->vm_ops = &ext3_file_vm_ops;
> + vma->vm_flags |= VM_CAN_NONLINEAR;
> + return 0;
> +}
> +
> const struct file_operations ext3_file_operations = {
> .llseek = generic_file_llseek,
> .read = do_sync_read,
> @@ -62,7 +79,7 @@ const struct file_operations ext3_file_operations = {
> #ifdef CONFIG_COMPAT
> .compat_ioctl = ext3_compat_ioctl,
> #endif
> - .mmap = generic_file_mmap,
> + .mmap = ext3_file_mmap,
> .open = dquot_file_open,
> .release = ext3_release_file,
> .fsync = ext3_sync_file,
> diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
> index 68b2e43..66c31dd 100644
> --- a/fs/ext3/inode.c
> +++ b/fs/ext3/inode.c
> @@ -3496,3 +3496,74 @@ int ext3_change_inode_journal_flag(struct inode *inode, int val)
>
> return err;
> }
> +
> +int ext3_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
> +{
> + struct page *page = vmf->page;
> + loff_t size;
> + unsigned long len;
> + int ret = -EINVAL;
> + void *fsdata;
> + struct file *file = vma->vm_file;
> + struct inode *inode = file->f_path.dentry->d_inode;
> + struct address_space *mapping = inode->i_mapping;
> +
> + /*
> + * Get i_alloc_sem to stop truncates messing with the inode. We cannot
> + * get i_mutex because we are already holding mmap_sem.
> + */
> + down_read(&inode->i_alloc_sem);
> + size = i_size_read(inode);
> + if (page->mapping != mapping || size <= page_offset(page)
> + || !PageUptodate(page)) {
> + /* page got truncated from under us? */
> + goto out_unlock;
> + }
> + ret = 0;
> + if (PageMappedToDisk(page))
> + goto out_frozen;
> +
> + if (page->index == size >> PAGE_CACHE_SHIFT)
> + len = size & ~PAGE_CACHE_MASK;
> + else
> + len = PAGE_CACHE_SIZE;
> +
> + lock_page(page);
> + /*
> + * return if we have all the buffers mapped. This avoid
> + * the need to call write_begin/write_end which does a
> + * journal_start/journal_stop which can block and take
> + * long time
> + */
> + if (page_has_buffers(page)) {
> + if (!walk_page_buffers(NULL, page_buffers(page), 0, len, NULL,
> + buffer_unmapped)) {
> + unlock_page(page);
> +out_frozen:
> + vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE);
> + goto out_unlock;
> + }
> + }
> + unlock_page(page);
> + /*
> + * OK, we need to fill the hole... Do write_begin write_end
> + * to do block allocation/reservation.We are not holding
> + * inode.i__mutex here. That allow * parallel write_begin,
> + * write_end call. lock_page prevent this from happening
> + * on the same page though
> + */
> + ret = mapping->a_ops->write_begin(file, mapping, page_offset(page),
> + len, AOP_FLAG_UNINTERRUPTIBLE, &page, &fsdata);
> + if (ret < 0)
> + goto out_unlock;
> + ret = mapping->a_ops->write_end(file, mapping, page_offset(page),
> + len, len, page, fsdata);
> + if (ret < 0)
> + goto out_unlock;
> + ret = 0;
> +out_unlock:
> + if (ret)
> + ret = VM_FAULT_SIGBUS;
> + up_read(&inode->i_alloc_sem);
> + return ret;
> +}
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index f2fa5e8..44979ae 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -5812,7 +5812,7 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
> }
> ret = 0;
> if (PageMappedToDisk(page))
> - goto out_unlock;
> + goto out_frozen;
>
> if (page->index == size >> PAGE_CACHE_SHIFT)
> len = size & ~PAGE_CACHE_MASK;
> @@ -5830,6 +5830,8 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
> if (!walk_page_buffers(NULL, page_buffers(page), 0, len, NULL,
> ext4_bh_unmapped)) {
> unlock_page(page);
> +out_frozen:
> + vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE);
> goto out_unlock;
> }
> }
> diff --git a/include/linux/ext3_fs.h b/include/linux/ext3_fs.h
> index 85c1d30..a0e39ca 100644
> --- a/include/linux/ext3_fs.h
> +++ b/include/linux/ext3_fs.h
> @@ -919,6 +919,7 @@ extern void ext3_get_inode_flags(struct ext3_inode_info *);
> extern void ext3_set_aops(struct inode *inode);
> extern int ext3_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> u64 start, u64 len);
> +extern int ext3_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf);
>
> /* ioctl.c */
> extern long ext3_ioctl(struct file *, unsigned int, unsigned long);
> --
> 1.5.5.6
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
next prev parent reply other threads:[~2011-04-22 22:10 UTC|newest]
Thread overview: 120+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-02-07 11:53 [BUG] ext4: cannot unfreeze a filesystem due to a deadlock Masayoshi MIZUMA
2011-02-15 16:06 ` Jan Kara
2011-02-15 17:03 ` Ted Ts'o
2011-02-15 17:29 ` Jan Kara
2011-02-15 18:04 ` Ted Ts'o
2011-02-15 19:11 ` Jan Kara
2011-02-15 23:17 ` Toshiyuki Okajima
2011-02-16 14:56 ` Jan Kara
2011-02-17 3:50 ` Toshiyuki Okajima
2011-02-17 5:13 ` Andreas Dilger
2011-02-17 10:41 ` Jan Kara
2011-02-17 10:45 ` Jan Kara
2011-03-28 8:06 ` [RFC][PATCH] " Toshiyuki Okajima
2011-03-30 14:12 ` Jan Kara
2011-03-31 8:37 ` Yongqiang Yang
2011-03-31 8:48 ` Yongqiang Yang
2011-03-31 14:04 ` Eric Sandeen
2011-03-31 14:36 ` Yongqiang Yang
2011-03-31 15:25 ` Eric Sandeen
2011-03-31 16:28 ` Jan Kara
2011-03-31 12:03 ` Toshiyuki Okajima
2011-04-05 10:25 ` Toshiyuki Okajima
2011-04-05 22:54 ` Jan Kara
2011-04-06 5:09 ` Toshiyuki Okajima
2011-04-06 5:57 ` Jan Kara
2011-04-06 7:40 ` Toshiyuki Okajima
2011-04-06 17:46 ` Jan Kara
2011-04-15 13:39 ` Toshiyuki Okajima
2011-04-15 17:13 ` Jan Kara
2011-04-15 17:17 ` Eric Sandeen
2011-04-15 17:37 ` Jan Kara
2011-04-18 9:05 ` Toshiyuki Okajima
2011-04-18 10:51 ` Jan Kara
2011-04-19 9:43 ` Toshiyuki Okajima
2011-04-22 6:58 ` Toshiyuki Okajima
2011-04-22 21:26 ` Peter M. Petrakis
2011-04-22 21:40 ` Jan Kara
2011-04-22 22:57 ` Peter M. Petrakis
2011-04-22 22:10 ` Jan Kara [this message]
2011-04-25 6:28 ` Toshiyuki Okajima
2011-05-03 8:06 ` Surbhi Palande
2011-05-03 11:01 ` Surbhi Palande
2011-05-03 13:08 ` (unknown), Surbhi Palande
2011-05-03 13:46 ` your mail Jan Kara
2011-05-03 13:56 ` Surbhi Palande
2011-05-03 15:26 ` Surbhi Palande
2011-05-03 15:36 ` Jan Kara
2011-05-03 15:43 ` Surbhi Palande
2011-05-04 19:24 ` Jan Kara
2011-05-06 15:20 ` [RFC][PATCH] Do not accept a new handle when the F.S is frozen Surbhi Palande
2011-05-06 15:20 ` [PATCH] Adding support to freeze and unfreeze a journal Surbhi Palande
2011-05-06 20:56 ` Andreas Dilger
2011-05-07 20:04 ` [PATCH v2] " Surbhi Palande
2011-05-08 8:24 ` Marco Stornelli
2011-05-09 9:04 ` Surbhi Palande
2011-05-09 9:24 ` Jan Kara
2011-05-09 9:53 ` Jan Kara
2011-05-09 13:49 ` Surbhi Palande
2011-05-09 14:51 ` [PATCH v3] " Surbhi Palande
2011-05-09 15:08 ` Jan Kara
2011-05-10 15:07 ` [PATCH] " Surbhi Palande
2011-05-10 21:07 ` Andreas Dilger
2011-05-11 7:46 ` Surbhi Palande
2011-05-09 15:23 ` [PATCH v3] " Eric Sandeen
2011-05-11 7:06 ` Surbhi Palande
2011-05-11 7:10 ` [PATCH] Attempt to sync the fsstress writes to a frozen F.S Surbhi Palande
2011-05-12 14:22 ` Eric Sandeen
2011-05-24 21:42 ` Ted Ts'o
2011-05-25 12:00 ` Surbhi Palande
2011-05-25 12:12 ` Theodore Tso
2011-05-27 16:28 ` Jan Kara
2011-05-11 9:05 ` [PATCH v3] Adding support to freeze and unfreeze a journal Andreas Dilger
2011-05-12 9:40 ` Surbhi Palande
2011-05-03 13:08 ` [PATCH] Prevent dirtying a page when ext4 F.S is frozen Surbhi Palande
2011-05-03 15:19 ` [RFC][PATCH] Re: [BUG] ext4: cannot unfreeze a filesystem due to a deadlock Jan Kara
2011-05-04 12:09 ` Surbhi Palande
2011-05-04 19:19 ` Jan Kara
2011-05-04 21:34 ` Surbhi Palande
2011-05-04 22:48 ` Jan Kara
2011-05-05 6:06 ` Surbhi Palande
2011-05-05 11:18 ` Jan Kara
2011-05-05 14:01 ` Surbhi Palande
2011-03-31 23:40 ` Dave Chinner
2011-03-31 23:53 ` Eric Sandeen
2011-04-01 14:08 ` Jan Kara
2011-04-06 5:40 ` Dave Chinner
2011-04-06 6:18 ` Jan Kara
2011-04-06 11:21 ` Dave Chinner
2011-04-06 13:44 ` Christoph Hellwig
2011-04-06 22:59 ` Dave Chinner
2011-04-06 17:40 ` Jan Kara
2011-04-06 22:54 ` Dave Chinner
2011-04-08 21:33 ` Jan Kara
2011-05-02 9:07 ` Surbhi Palande
2011-05-02 10:56 ` Jan Kara
2011-05-02 11:27 ` Surbhi Palande
2011-05-02 12:06 ` Surbhi Palande
2011-05-02 12:20 ` Jan Kara
2011-05-02 12:30 ` Surbhi Palande
2011-05-02 13:16 ` Jan Kara
2011-05-02 13:22 ` Christoph Hellwig
2011-05-02 14:20 ` Jan Kara
2011-05-02 14:41 ` Christoph Hellwig
2011-05-02 16:23 ` Jan Kara
2011-05-02 16:38 ` Christoph Hellwig
2011-05-02 13:22 ` Surbhi Palande
2011-05-02 13:24 ` Christoph Hellwig
2011-05-02 13:27 ` Surbhi Palande
2011-05-02 14:26 ` Jan Kara
2011-05-02 14:04 ` Eric Sandeen
2011-05-03 7:27 ` Surbhi Palande
2011-05-03 20:14 ` Eric Sandeen
2011-05-04 8:26 ` Surbhi Palande
2011-05-04 14:30 ` Eric Sandeen
2011-05-02 14:01 ` Eric Sandeen
2011-04-05 10:44 ` Toshiyuki Okajima
2011-12-09 1:56 ` Masayoshi MIZUMA
2011-12-15 12:41 ` Masayoshi MIZUMA
2013-11-29 4:58 ` Yongqiang Yang
2013-11-29 8:00 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110422221025.GF2977@quack.suse.cz \
--to=jack@suse.cz \
--cc=adilger.kernel@dilger.ca \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=m.mizuma@jp.fujitsu.com \
--cc=sandeen@redhat.com \
--cc=toshi.okajima@jp.fujitsu.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).