public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: Zdenek Kabelac <zkabelac@redhat.com>,
	Jens Axboe <axboe@kernel.dk>, Li Nan <linan666@huaweicloud.com>,
	Christoph Hellwig <hch@infradead.org>,
	Chaitanya Kulkarni <chaitanyak@nvidia.com>,
	linux-block@vger.kernel.org, dm-devel@lists.linux.dev,
	ming.lei@redhat.com
Subject: Re: [PATCH v3 0/4] brd discard patches
Date: Tue, 23 Jan 2024 10:49:40 +0800	[thread overview]
Message-ID: <Za8pRGZ9ZV3/jwCH@fedora> (raw)
In-Reply-To: <dc9e648b-6c5f-9642-8892-b48dbc893c6@redhat.com>

On Mon, Jan 22, 2024 at 05:30:07PM +0100, Mikulas Patocka wrote:
> Hi
> 
> 
> On Fri, 19 Jan 2024, Ming Lei wrote:
> 
> > Hi Mikulas,
> > 
> > On Thu, Aug 10, 2023 at 12:07:07PM +0200, Mikulas Patocka wrote:
> > > Hi
> > > 
> > > Here I'm submitting the ramdisk discard patches for the next merge window. 
> > > If you want to make some more changes, please let me now.
> > 
> > brd discard is removed in f09a06a193d9 ("brd: remove discard support")
> > in 2017 because it is just driver private write_zero, and user can get same
> > result with fallocate(FALLOC_FL_ZERO_RANGE).
> > 
> > Also you only mentioned the motivation in V1 cover-letter:
> > 
> > https://lore.kernel.org/linux-block/alpine.LRH.2.02.2209151604410.13231@file01.intranet.prod.int.rdu2.redhat.com/
> > 
> > ```
> > Zdenek asked me to write it, because we use brd in the lvm2 testsuite and
> > it would be benefical to run the testsuite with discard enabled in order
> > to test discard handling.
> > ```
> > 
> > But we have lots of test disks with discard support: loop, scsi_debug,
> > null_blk, ublk, ..., so one requestion is that why brd discard is
> > a must for lvm2 testsuite to cover (lvm)discard handling?
> 
> We should ask Zdeněk Kabeláč about it - he is expert about the lvm2 
> testsuite.
> 
> > The reason why brd didn't support discard by freeing pages is writeback
> > deadlock risk, see:
> > 
> > commit f09a06a193d9 ("brd: remove discard support")
> > 
> > -static void discard_from_brd(struct brd_device *brd,
> > -                       sector_t sector, size_t n)
> > -{
> > -       while (n >= PAGE_SIZE) {
> > -               /*
> > -                * Don't want to actually discard pages here because
> > -                * re-allocating the pages can result in writeback
> > -                * deadlocks under heavy load.
> > -                */
> > -               if (0)
> > -                       brd_free_page(brd, sector);
> > -               else
> > -                       brd_zero_page(brd, sector);
> > -               sector += PAGE_SIZE >> SECTOR_SHIFT;
> > -               n -= PAGE_SIZE;
> > -       }
> > -}
> > 
> > However, you didn't mention how your patches address this potential
> > risk, care to document it? I can't find any related words about
> > this problem.
> 
> The writeback deadlock can happen even without discard - if the machine 
> runs out of memory while writing data to a ramdisk. But the probability is 
> increased when discard is used, because pages are freed and re-allocated 
> more often.

Yeah, I agree, what I meant is that this thing needs to be documented,
given discard is re-introduced, and the original deadlock comment isn't
addressed

> 
> Generally, the admin should make sure that the machine has enough 
> available memory when creating a ramdisk - then, the deadlock can't 
> happen.
> 
> Ramdisk has no limit on the number of allocated pages, so when it runs out 
> of memory, the oom killer will try to kill unrelated processes and the 
> machine will hang. If there is risk of overflowing the available memory, 
> the admin should use tmpfs instead of a ramdisk - tmpfs can be configured 
> with a limit and it can also swap out pages.
> 
> > BTW, your patches looks more complicated than the original removed
> > discard implementation. And if the above questions get addressed,
> > I am happy to provide review on the following patches.
> 
> My patches actually free the discarded pages. The original discard 
> implementation just overwrote the pages with zeroes without freeing them.

The original implementation supports to discard by freeing pages, and
it is just bypassed unconditionally by:

               if (0)
                       brd_free_page(brd, sector);
               else
                       brd_zero_page(brd, sector);

However, page could be freed by discard when it is being consumed in brd_do_bvec().

Maybe your patch of "brd: extend the rcu regions to cover read and write"
can be simplified a bit, such as:

- grab rcu read lock in brd_do_bvec()
- release the rcu read lock when allocating page via alloc_page() in
  brd_insert_page()
- change free page by rcu

Or avoid it by holding page reference:

- grabbing page reference in brd_lookup_page() if it is called from
copy_to_brd() or copy_from_brd(), and drop it after it is consumed


Thanks,
Ming


      reply	other threads:[~2024-01-23  2:49 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-10 10:07 [PATCH v3 0/4] brd discard patches Mikulas Patocka
2023-08-10 10:08 ` [PATCH v3 1/4] brd: use a switch statement in brd_submit_bio Mikulas Patocka
2023-08-10 10:09 ` [PATCH v3 2/4] brd: extend the rcu regions to cover read and write Mikulas Patocka
2023-08-10 10:09 ` [PATCH v3 3/4] brd: enable discard Mikulas Patocka
2023-08-10 10:10 ` [PATCH v3 4/4] brd: implement write zeroes Mikulas Patocka
2023-11-10  1:22 ` [PATCH v3 0/4] brd discard patches Li Nan
2023-11-14 13:59   ` Mikulas Patocka
2024-01-19  8:41 ` Ming Lei
2024-01-22 16:30   ` Mikulas Patocka
2024-01-23  2:49     ` Ming Lei [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Za8pRGZ9ZV3/jwCH@fedora \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=chaitanyak@nvidia.com \
    --cc=dm-devel@lists.linux.dev \
    --cc=hch@infradead.org \
    --cc=linan666@huaweicloud.com \
    --cc=linux-block@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=zkabelac@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox