linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sergei Shtepa <sergei.shtepa@veeam.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Damien Le Moal <Damien.LeMoal@wdc.com>,
	"axboe@kernel.dk" <axboe@kernel.dk>,
	"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
	"hch@infradead.org" <hch@infradead.org>,
	"darrick.wong@oracle.com" <darrick.wong@oracle.com>,
	"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"rjw@rjwysocki.net" <rjw@rjwysocki.net>,
	"len.brown@intel.com" <len.brown@intel.com>,
	"pavel@ucw.cz" <pavel@ucw.cz>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	Johannes Thumshirn <Johannes.Thumshirn@wdc.com>,
	"ming.lei@redhat.com" <ming.lei@redhat.com>,
	"jack@suse.cz" <jack@suse.cz>, "tj@kernel.org" <tj@kernel.org>,
	"gustavo@embeddedor.com" <gustavo@embeddedor.com>,
	"bvanassche@acm.org" <bvanassche@acm.org>,
	"osandov@fb.com" <osandov@fb.com>,
	"koct9i@gmail.com" <koct9i@gmail.com>,
	"steve@sk2.org" <steve@sk2.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [PATCH 1/2] Block layer filter - second version
Date: Wed, 21 Oct 2020 17:35:53 +0300	[thread overview]
Message-ID: <20201021143553.GG20749@veeam.com> (raw)
In-Reply-To: <20201021130753.GM20115@casper.infradead.org>

The 10/21/2020 16:07, Matthew Wilcox wrote:
> On Wed, Oct 21, 2020 at 03:55:55PM +0300, Sergei Shtepa wrote:
> > The 10/21/2020 14:44, Matthew Wilcox wrote:
> > > I don't understand why O_DIRECT gets to bypass the block filter.  Nor do
> > > I understand why anybody would place a block filter on the swap device.
> > > But if somebody did place a filter on the swap device, why should swap
> > > be able to bypass the filter?
> > 
> > Yes, intercepting the swap partition is absurd. But we can't guarantee
> > that the filter won't intercept swap.
> > 
> > Swap operation is related to the memory allocation logic. If a swap on
> > the block device are accessed during memory allocation from filter,
> > a deadlock occurs. We can allow filters to occasionally shoot off their
> > feet, especially under high load. But I think it's better not to do it.
> 
> We already have logic to prevent this in Linux.  Filters need to
> call memalloc_noio_save() while they might cause swap to happen and
> memalloc_noio_restore() once it's safe for them to cause swap again.

Yes, I looked at this function, it can really be useful for the filter.
Then I don't need to enter the submit_bio_direct() function and the wait
loop associated with the queue polling function blk_mq_poll() will have
to be rewritten.

> 
> > "directly access" - it is not O_DIRECT. This means (I think) direct
> > reading from the device file, like "dd if=/dev/sda1".
> > As for intercepting direct reading, I don't know how to do the right thing.
> > 
> > The problem here is that in fs/block_dev.c in function __blkdev_direct_IO()
> > uses the qc - value returned by the submit_bio() function.
> > This value is used below when calling 
> > blk_poll(bdev_get_queue(dev), qc, true).
> > The filter cannot return a meaningful value of the blk_qc_t type when
> > intercepting a request, because at that time it does not know which queue
> > the request will fall into.
> > 
> > If function submit_bio() will always return BLK_QC_T_NONE - I think the
> > algorithm of the __blk dev_direct_IO() will not work correctly.
> > If we need to intercept direct access to a block device, we need to at
> > least redo the __blkdev_direct_IO function, getting rid of blk_pool.
> > I'm not sure it's necessary yet.
> 
> This isn't part of the block layer that I'm familiar with, so I can't
> help solve this problem, but allowing O_DIRECT to bypass the block filter
> is a hole that needs to be fixed before these patches can be considered.

I think there is no such problem, but I will check, of course.

-- 
Sergei Shtepa
Veeam Software developer.

  reply	other threads:[~2020-10-21 14:35 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-21  9:04 [PATCH 0/2] block layer filter and block device snapshot module Sergei Shtepa
2020-10-21  9:04 ` [PATCH 1/2] Block layer filter - second version Sergei Shtepa
2020-10-21  9:14   ` Johannes Thumshirn
2020-10-21 10:01     ` Sergei Shtepa
2020-10-21  9:21   ` Damien Le Moal
2020-10-21 10:27     ` Sergei Shtepa
2020-10-21 11:44     ` Matthew Wilcox
2020-10-21 12:55       ` Sergei Shtepa
2020-10-21 13:07         ` Matthew Wilcox
2020-10-21 14:35           ` Sergei Shtepa [this message]
2020-10-21 15:09   ` Randy Dunlap
2020-10-24 14:53   ` Greg KH
2020-10-21  9:04 ` [PATCH 2/2] blk-snap - snapshots and change-tracking for block devices Sergei Shtepa
2020-10-21  9:08   ` Pavel Machek
2020-10-21  9:37     ` Sergei Shtepa
2020-10-21  9:23   ` Damien Le Moal
2020-10-21 11:15     ` Sergei Shtepa
2020-10-21 10:48   ` kernel test robot
2020-10-21 15:11   ` Randy Dunlap
2020-10-21 13:31 ` [PATCH 0/2] block layer filter and block device snapshot module Hannes Reinecke
2020-10-21 14:10   ` Sergei Shtepa
2020-10-22  5:58     ` Hannes Reinecke
2020-10-22  9:44       ` Sergei Shtepa
2020-10-22 10:28         ` Damien Le Moal
2020-10-22 13:52           ` Sergei Shtepa
2020-10-22 15:14             ` Darrick J. Wong
2020-10-22 17:54               ` Mike Snitzer
2020-10-23  9:13                 ` hch
2020-10-23 10:31                   ` Hannes Reinecke
2020-10-23 11:04                     ` Sergei Shtepa
2020-10-23 11:12                     ` [dm-devel] " hch
2020-10-22 18:35 ` Mike Snitzer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201021143553.GG20749@veeam.com \
    --to=sergei.shtepa@veeam.com \
    --cc=Damien.LeMoal@wdc.com \
    --cc=Johannes.Thumshirn@wdc.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=darrick.wong@oracle.com \
    --cc=gustavo@embeddedor.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=koct9i@gmail.com \
    --cc=len.brown@intel.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=osandov@fb.com \
    --cc=pavel@ucw.cz \
    --cc=rjw@rjwysocki.net \
    --cc=steve@sk2.org \
    --cc=tj@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).