From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Sat, 29 Oct 2016 20:17:53 +0100 From: Al Viro To: Linus Torvalds Cc: Christoph Hellwig , Jan Kara , Dmitry Monakhov , Jeff Moyer , linux-fsdevel , linux-aio@kvack.org, Linux Kernel Mailing List , stable Subject: Re: [PATCH] aio: fix a user triggered use after free (and fix freeze protection of aio writes) Message-ID: <20161029191753.GU19539@ZenIV.linux.org.uk> References: <1477727070-18806-1-git-send-email-hch@lst.de> <1477727070-18806-2-git-send-email-hch@lst.de> <20161029122451.GQ19539@ZenIV.linux.org.uk> <20161029152017.GA7388@lst.de> <20161029185222.GT19539@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Sat, Oct 29, 2016 at 12:07:07PM -0700, Linus Torvalds wrote: > On Sat, Oct 29, 2016 at 11:52 AM, Al Viro wrote: > > > > And that call can happen as soon as we return from __blockdev_direct_IO() > > (even earlier, actually). As soon as that happens, the reference to > > struct file we'd acquired in io_submit_one() is dropped. If descriptor > > table had been shared, another thread might have already closed that sucker, > > and fput() from aio_complete() would free struct file. > > But that's the point. We don't *do* anything like that any more. We > now always do the final access from aio_complete(). So it doesn't > matter if that is called asynchronously (very early) or not. > > That's the whole point of the patch. Exactly to do everything either > *before* we even submit it (at which point no completion can happen), > or doing it in aio_complete() which is guaranteed to be after the > submission. No races, no use-after-free. > > What am I missing? if (ret > 0) ret = __generic_file_write_iter(iocb, from); inode_unlock(inode); in generic_file_write_iter() (inode might be gone here) written = mapping->a_ops->direct_IO(iocb, &data); /* * Finally, try again to invalidate clean pages which might have been * cached by non-direct readahead, or faulted in by get_user_pages() * if the source of the write was an mmap'ed region of the file * we're writing. Either one is a pretty crazy thing to do, * so we don't support it 100%. If this invalidation * fails, tough, the write still worked... */ if (mapping->nrpages) { invalidate_inode_pages2_range(mapping, pos >> PAGE_SHIFT, end); in generic_file_direct_write() (mapping points into inode, which might be freed) ret = __blockdev_direct_IO(iocb, inode, target->bt_bdev, &data, xfs_get_blocks_direct, NULL, NULL, 0); if (ret >= 0) { iocb->ki_pos += ret; iov_iter_advance(to, ret); } xfs_rw_iunlock(ip, XFS_IOLOCK_SHARED); in xfs_file_dio_aio_read() (ip points to xfs-private part of inode). And that - just from a quick look. We *do* access inode between the call of __blockdev_direct_IO() and return from ->write_iter(). What's more, as soon as aio_complete() has happened, what's to stop close from another thread + umount + rmmod unmapping that ->write_iter() completely? AFAICS, the possibility of dropping the last reference to struct file before ->write_iter() has returned is fundamentally broken. I might be missing something subtle here, but...