public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Eryu Guan <eguan@redhat.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: warnings complaining IOMAP_DELALLOC blocks in iomap_dio_actor from generic/446
Date: Mon, 24 Jul 2017 09:05:51 -0700	[thread overview]
Message-ID: <20170724160551.GB4352@magnolia> (raw)
In-Reply-To: <20170724145019.GG9167@eguan.usersys.redhat.com>

On Mon, Jul 24, 2017 at 10:50:19PM +0800, Eryu Guan wrote:
> Hi all,
> 
> Currently generic/446 could trigger a warning in iomap_dio_actor()
> easily, it's complaining about unexpected iomap->type (see the end for
> full call trace).
> 
> fs/iomap.c: iomap_dio_actor()
>  859         default:                                                                                                                                                                          
>  860                 WARN_ON_ONCE(1);                                                                                                                                                          
>  861                 return -EIO;
> 
> It's due to the race between direct read and mmap write pagefault on the
> same *sparse* file. 
> 
> direct read process			mmap write process
> xfs_file_dio_aio_read (take IOLOCK_SHARED)
>  iomap_dio_rw
>   iomap_apply
>    filemap_write_and_wait_range
>    invalidate_inode_pages2_range
>    iomap_apply
> 					mmap
> 					 xfs_filemap_page_mkwrite (take MMAPLOCK_SHARED)
> 					  iomap_page_mkwrite
> 					   iomap_apply
> 					    xfs_file_iomap_begin
> 					     xfs_file_iomap_begin_delay (take ILOCK_EXCL)
> 					     (release ILOCK_EXCL)
> 					     ...
>     xfs_file_iomap_begin
>     (take ILOCK and read in bmap info)
>     iomap_dio_actor
>     ...
> 
> The dio path and page_mkwrite path are taking different locks so they're
> not serialized. So after dio read flushing the file range but before
> taking ILOCK, the page faults from mmap write could fault in and update
> the file map first with delalloc blocks. Then the dio reader sees this
> delalloc block map unexpectedly.
> 
> I'm not sure what's the best way to fix it, but a quick hack of
> disabling delalloc in the write page fault path could do the work for
> me, e.g.
> 
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -981,7 +981,7 @@ xfs_file_iomap_begin(
>         if (XFS_FORCED_SHUTDOWN(mp))
>                 return -EIO;
> 
> -       if (((flags & (IOMAP_WRITE | IOMAP_DIRECT)) == IOMAP_WRITE) &&
> +       if (((flags & (IOMAP_WRITE|IOMAP_DIRECT|IOMAP_FAULT)) == IOMAP_WRITE) &&
>                         !IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
>                 /* Reserve delalloc blocks for regular writeback. */
>                 return xfs_file_iomap_begin_delay(inode, offset, length, flags,
> 
> So that mmap write page fault brings in already allocated blocks, and
> dio reader sees non-IOMAP_DELALLOC iomaps.

But now we can't take advantage of delayed allocation for mmap writes
even when the user isn't being evil by peppering us with dio reads.

> I know concurrent dio and mmap io are not recommended, so is this
> something that doens't need a fix at all, and the test should filter out
> the warning instead?

XFS no longer BUG_ON, so I guess it's fine if the test filters out the
warning.

It looks like the end result of a dioread/mmapwrite collision is that
the dio reader gets -EIO.  Would it be better to return a short read?

--D

> 
> Thanks,
> Eryu
> 
> [162097.402359] ------------[ cut here ]------------
> [162097.402745] WARNING: CPU: 1 PID: 20582 at fs/iomap.c:860 iomap_dio_actor+0xca/0x370
> [162097.403392] Modules linked in: rpcsec_gss_krb5 nfsv4(OE) dns_resolver nfs(OE) fscache btrfs xor raid6_pq ppdev i2c_piix4 parport_pc sg virtio_balloon parport i2c_core pcspkr nfsd(OE) auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ata_generic pata_acpi virtio_scsi ata_piix 8139too libata virtio_pci 8139cp floppy virtio_ring mii serio_raw virtio
> [162097.406258] CPU: 1 PID: 20582 Comm: xfs_io Tainted: G        W  OE   4.13.0-rc1 #191
> [162097.406898] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
> [162097.407388] task: ffff880212e40000 task.stack: ffffc900012ac000
> [162097.408403] RIP: 0010:iomap_dio_actor+0xca/0x370
> [162097.409072] RSP: 0018:ffffc900012afc20 EFLAGS: 00010202
> [162097.409533] RAX: 0000000000000002 RBX: ffffc900012afcb0 RCX: 000000000241b000
> [162097.410144] RDX: 00000000000001ff RSI: ffffc900012afe30 RDI: ffffc900012afe68
> [162097.410720] RBP: ffffc900012afc90 R08: ffffc900012afcb0 R09: ffff880211a65c60
> [162097.411324] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
> [162097.411912] R13: ffffffffa016bb60 R14: ffff880211a65c60 R15: 0000000000000009
> [162097.412503] FS:  00007f94e825a740(0000) GS:ffff88021fc80000(0000) knlGS:0000000000000000
> [162097.413181] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [162097.413648] CR2: 000000000241a000 CR3: 0000000212c22000 CR4: 00000000000006e0
> [162097.414261] Call Trace:
> [162097.414478]  ? iomap_dio_zero+0x100/0x100
> [162097.414808]  iomap_apply+0xa1/0x110
> [162097.415133]  iomap_dio_rw+0x20b/0x390
> [162097.415442]  ? iomap_dio_zero+0x100/0x100
> [162097.415804]  xfs_file_dio_aio_read+0x6e/0xf0 [xfs]
> [162097.416250]  xfs_file_read_iter+0xab/0xc0 [xfs]
> [162097.416629]  __vfs_read+0xe0/0x150
> [162097.416925]  vfs_read+0x8c/0x130
> [162097.417217]  SyS_pread64+0x87/0xb0
> [162097.417512]  do_syscall_64+0x67/0x150
> [162097.417818]  entry_SYSCALL64_slow_path+0x25/0x25
> [162097.418234] RIP: 0033:0x7f94e7e3cf73
> [162097.418533] RSP: 002b:00007fff7f52ae90 EFLAGS: 00000293 ORIG_RAX: 0000000000000011
> [162097.419167] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f94e7e3cf73
> [162097.419740] RDX: 0000000000001000 RSI: 000000000241a000 RDI: 0000000000000003
> [162097.420339] RBP: 00007fff7f52af50 R08: 0000000000000000 R09: 0000000000000000
> [162097.420924] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
> [162097.421516] R13: 0000000000020000 R14: 0000000000000000 R15: 0000000000000000
> [162097.422124] Code: e1 48 09 c1 48 85 ca 0f 85 82 02 00 00 0f b7 43 18 66 83 f8 03 0f 84 fb 01 00 00 66 83 f8 04 74 35 66 83 f8 01 0f 84 07 02 00 00 <0f> ff 48 c7 c0 fb ff ff ff 48 8b 4d d0 65 48 33 0c 25 28 00 00
> [162097.423661] ---[ end trace 927f98a352499782 ]---
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2017-07-24 16:05 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-24 14:50 warnings complaining IOMAP_DELALLOC blocks in iomap_dio_actor from generic/446 Eryu Guan
2017-07-24 16:05 ` Darrick J. Wong [this message]
2017-07-24 17:51   ` Eryu Guan
2017-07-24 18:51     ` Darrick J. Wong
2017-07-24 22:02     ` Dave Chinner
2017-07-25  7:53       ` Eryu Guan
2017-07-25 23:43         ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170724160551.GB4352@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=eguan@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox