From: Jens Axboe <axboe@kernel.dk>
To: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>, Christoph Hellwig <hch@lst.de>,
Matthew Wilcox <willy@infradead.org>,
Logan Gunthorpe <logang@deltatee.com>,
Christoph Hellwig <hch@infradead.org>,
Jeff Layton <jlayton@kernel.org>,
linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 7/7] iov_iter, block: Make bio structs pin pages rather than ref'ing if appropriate
Date: Mon, 9 Jan 2023 10:27:14 -0700 [thread overview]
Message-ID: <0113e8b2-0ce4-e0f1-7924-bb5389d168de@kernel.dk> (raw)
In-Reply-To: <1880793.1673257404@warthog.procyon.org.uk>
On 1/9/23 2:43?AM, David Howells wrote:
> Jens Axboe <axboe@kernel.dk> wrote:
>
>>> A field, bi_cleanup_mode, is added to the bio struct that gets set by
>>> iov_iter_extract_pages() with FOLL_* flags indicating what cleanup is
>>> necessary. FOLL_GET -> put_page(), FOLL_PIN -> unpin_user_page(). Other
>>> flags could also be used in future.
>>>
>>> Newly allocated bio structs have bi_cleanup_mode set to FOLL_GET to
>>> indicate that attached pages are ref'd by default. Cloning sets it to 0.
>>> __bio_iov_iter_get_pages() overrides it to what iov_iter_extract_pages()
>>> indicates.
>>
>> What's the motivation for this change?
>
> DIO reads in most filesystems and, I think, the block layer are
> currently broken with respect to concurrent fork in the same process
> because they take refs (FOLL_GET) on the pages involved which causes
> the CoW mechanism to malfunction, leading (I think) the parent process
> to not see the result of the DIO. IIRC, the pages undergoing DIO get
> forcibly copied by fork - and the copies given to the parent.
> Instead, DIO reads should be pinning the pages (FOLL_PIN). Maybe
> Willy can weigh in on this?
>
> Further, getting refs on pages in, say, a KVEC iterator is the wrong
> thing to do as the kvec may point to things that shouldn't be ref'd
> (vmap'd or vmalloc'd regions, for example). Instead, the in-kernel
> caller should do what it needs to do to keep hold of the memory and
> the DIO should not take a ref at all.
Makes sense!
>> It's growing struct bio, which we can have a lot of in the system. I read
>> the cover letter too and I can tell what the change does, but there's no
>> justification really for the change.
>
> The FOLL_* flags I'm getting back from iov_iter_extract_pages() can be mapped
> to BIO_* flags in the bio. For the moment, AFAIK, I think only FOLL_GET and
> FOLL_PIN are necessary. There are three cleanup types: put, unpin and do
> nothing.
That would work, or if they fit into a ushort, there is room that could
be utilized for that. My main knee jerk reaction is just plopping a full
into in there for 2 bits of state, really. I try pretty hard to not
succumb to struct bloat, particularly on the hot path and when we can
have lots of them. So that part needs to be done more carefully in v5.
--
Jens Axboe
next prev parent reply other threads:[~2023-01-09 17:28 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-07 0:33 [PATCH v4 0/7] iov_iter: Add extraction helpers David Howells
2023-01-07 0:33 ` [PATCH v4 1/7] iov_iter: Change the direction macros into an enum David Howells
2023-01-07 0:33 ` [PATCH v4 2/7] iov_iter: Use the direction in the iterator functions David Howells
2023-01-07 0:33 ` [PATCH v4 3/7] iov_iter: Use IOCB/IOMAP_WRITE if available rather than iterator direction David Howells
2023-01-07 0:33 ` [PATCH v4 4/7] iov_iter: Add a function to extract a page list from an iterator David Howells
2023-01-07 0:34 ` [PATCH v4 5/7] netfs: Add a function to extract a UBUF or IOVEC into a BVEC iterator David Howells
2023-01-07 0:34 ` [PATCH v4 6/7] netfs: Add a function to extract an iterator into a scatterlist David Howells
2023-01-07 0:34 ` [PATCH v4 7/7] iov_iter, block: Make bio structs pin pages rather than ref'ing if appropriate David Howells
2023-01-09 3:54 ` Jens Axboe
2023-01-09 9:43 ` David Howells
2023-01-09 17:25 ` Jan Kara
2023-01-10 14:42 ` David Howells
2023-01-11 9:58 ` Jan Kara
2023-01-09 17:27 ` Jens Axboe [this message]
2023-01-09 17:35 ` Jan Kara
2023-01-09 21:37 ` David Howells
2023-01-09 21:57 ` Jens Axboe
2023-01-09 22:24 ` David Howells
2023-01-09 22:57 ` Jens Axboe
2023-01-10 14:37 ` David Howells
2023-01-10 21:41 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0113e8b2-0ce4-e0f1-7924-bb5389d168de@kernel.dk \
--to=axboe@kernel.dk \
--cc=dhowells@redhat.com \
--cc=hch@infradead.org \
--cc=hch@lst.de \
--cc=jlayton@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=logang@deltatee.com \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox