From: Ming Lei <ming.lei@redhat.com>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: Zdenek Kabelac <zkabelac@redhat.com>,
Jens Axboe <axboe@kernel.dk>, Li Nan <linan666@huaweicloud.com>,
Christoph Hellwig <hch@infradead.org>,
Chaitanya Kulkarni <chaitanyak@nvidia.com>,
linux-block@vger.kernel.org, dm-devel@lists.linux.dev,
ming.lei@redhat.com
Subject: Re: [PATCH v3 0/4] brd discard patches
Date: Tue, 23 Jan 2024 10:49:40 +0800 [thread overview]
Message-ID: <Za8pRGZ9ZV3/jwCH@fedora> (raw)
In-Reply-To: <dc9e648b-6c5f-9642-8892-b48dbc893c6@redhat.com>
On Mon, Jan 22, 2024 at 05:30:07PM +0100, Mikulas Patocka wrote:
> Hi
>
>
> On Fri, 19 Jan 2024, Ming Lei wrote:
>
> > Hi Mikulas,
> >
> > On Thu, Aug 10, 2023 at 12:07:07PM +0200, Mikulas Patocka wrote:
> > > Hi
> > >
> > > Here I'm submitting the ramdisk discard patches for the next merge window.
> > > If you want to make some more changes, please let me now.
> >
> > brd discard is removed in f09a06a193d9 ("brd: remove discard support")
> > in 2017 because it is just driver private write_zero, and user can get same
> > result with fallocate(FALLOC_FL_ZERO_RANGE).
> >
> > Also you only mentioned the motivation in V1 cover-letter:
> >
> > https://lore.kernel.org/linux-block/alpine.LRH.2.02.2209151604410.13231@file01.intranet.prod.int.rdu2.redhat.com/
> >
> > ```
> > Zdenek asked me to write it, because we use brd in the lvm2 testsuite and
> > it would be benefical to run the testsuite with discard enabled in order
> > to test discard handling.
> > ```
> >
> > But we have lots of test disks with discard support: loop, scsi_debug,
> > null_blk, ublk, ..., so one requestion is that why brd discard is
> > a must for lvm2 testsuite to cover (lvm)discard handling?
>
> We should ask Zdeněk Kabeláč about it - he is expert about the lvm2
> testsuite.
>
> > The reason why brd didn't support discard by freeing pages is writeback
> > deadlock risk, see:
> >
> > commit f09a06a193d9 ("brd: remove discard support")
> >
> > -static void discard_from_brd(struct brd_device *brd,
> > - sector_t sector, size_t n)
> > -{
> > - while (n >= PAGE_SIZE) {
> > - /*
> > - * Don't want to actually discard pages here because
> > - * re-allocating the pages can result in writeback
> > - * deadlocks under heavy load.
> > - */
> > - if (0)
> > - brd_free_page(brd, sector);
> > - else
> > - brd_zero_page(brd, sector);
> > - sector += PAGE_SIZE >> SECTOR_SHIFT;
> > - n -= PAGE_SIZE;
> > - }
> > -}
> >
> > However, you didn't mention how your patches address this potential
> > risk, care to document it? I can't find any related words about
> > this problem.
>
> The writeback deadlock can happen even without discard - if the machine
> runs out of memory while writing data to a ramdisk. But the probability is
> increased when discard is used, because pages are freed and re-allocated
> more often.
Yeah, I agree, what I meant is that this thing needs to be documented,
given discard is re-introduced, and the original deadlock comment isn't
addressed
>
> Generally, the admin should make sure that the machine has enough
> available memory when creating a ramdisk - then, the deadlock can't
> happen.
>
> Ramdisk has no limit on the number of allocated pages, so when it runs out
> of memory, the oom killer will try to kill unrelated processes and the
> machine will hang. If there is risk of overflowing the available memory,
> the admin should use tmpfs instead of a ramdisk - tmpfs can be configured
> with a limit and it can also swap out pages.
>
> > BTW, your patches looks more complicated than the original removed
> > discard implementation. And if the above questions get addressed,
> > I am happy to provide review on the following patches.
>
> My patches actually free the discarded pages. The original discard
> implementation just overwrote the pages with zeroes without freeing them.
The original implementation supports to discard by freeing pages, and
it is just bypassed unconditionally by:
if (0)
brd_free_page(brd, sector);
else
brd_zero_page(brd, sector);
However, page could be freed by discard when it is being consumed in brd_do_bvec().
Maybe your patch of "brd: extend the rcu regions to cover read and write"
can be simplified a bit, such as:
- grab rcu read lock in brd_do_bvec()
- release the rcu read lock when allocating page via alloc_page() in
brd_insert_page()
- change free page by rcu
Or avoid it by holding page reference:
- grabbing page reference in brd_lookup_page() if it is called from
copy_to_brd() or copy_from_brd(), and drop it after it is consumed
Thanks,
Ming
prev parent reply other threads:[~2024-01-23 2:49 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-10 10:07 [PATCH v3 0/4] brd discard patches Mikulas Patocka
2023-08-10 10:08 ` [PATCH v3 1/4] brd: use a switch statement in brd_submit_bio Mikulas Patocka
2023-08-10 10:09 ` [PATCH v3 2/4] brd: extend the rcu regions to cover read and write Mikulas Patocka
2023-08-10 10:09 ` [PATCH v3 3/4] brd: enable discard Mikulas Patocka
2023-08-10 10:10 ` [PATCH v3 4/4] brd: implement write zeroes Mikulas Patocka
2023-11-10 1:22 ` [PATCH v3 0/4] brd discard patches Li Nan
2023-11-14 13:59 ` Mikulas Patocka
2024-01-19 8:41 ` Ming Lei
2024-01-22 16:30 ` Mikulas Patocka
2024-01-23 2:49 ` Ming Lei [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Za8pRGZ9ZV3/jwCH@fedora \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=chaitanyak@nvidia.com \
--cc=dm-devel@lists.linux.dev \
--cc=hch@infradead.org \
--cc=linan666@huaweicloud.com \
--cc=linux-block@vger.kernel.org \
--cc=mpatocka@redhat.com \
--cc=zkabelac@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox