public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Fabio M. De Francesco" <fmdefrancesco@gmail.com>
To: Keith Busch <kbusch@kernel.org>, Sagi Grimberg <sagi@grimberg.me>,
	Christoph Hellwig <hch@lst.de>,
	Chaitanya Kulkarni <kch@nvidia.com>
Cc: linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org,
	Ira Weiny <ira.weiny@intel.com>
Subject: Re: [RFC PATCH] nvmet-tcp: Don't kmap() pages which can't come from HIGHMEM
Date: Wed, 17 Aug 2022 14:02:54 +0200	[thread overview]
Message-ID: <3177256.oiGErgHkdL@opensuse> (raw)
In-Reply-To: <c11e62ae-3439-e3e3-adba-d7f5f951791c@grimberg.me>

On mercoledì 17 agosto 2022 11:44:09 CEST Sagi Grimberg wrote:
> >> Therefore, I have two questions: am I right about thinking that the pages
> >> mapped in nvmet_tcp_map_pdu_iovec() are allocated with GFP_KERNEL?
> > 
> > I think you are correct.
> 
> It is correct. It is the same model for the linux scsi target, sunrpc
> etc.

I'll try to address the comments from the two last messages from Keith and 
Sagi with this email (I replied yesterday to Chaitanya).

First of all: good to know that it is the same model for other subsystem. This 
is useful to know. Thanks!

> >> If so, can anyone with more knowledge than mine please say if my changes
> >> make
> >> any sense?
> > 
> > I think it does make sense.

Thanks, I'm glad I was not wrong :-)

> > I like the code simplification, though this use
> > was't really paying the kmap penalty since, as you mentioned, this is 
never
> > highmem.

Correct, however everybody like code simplification. I added a couple of 
sentences to kmap_local_page() documentation in highmem.rst. They clearly 
state that, when users know that pages cannot come from Highmem, they may 
better prefer page_address().

The changes to nvmet-tcp started with trying to convert kmap() / kunmap() to 
kmap_local_page() /kunmap_local(), but it ended up to code shortening and 
simplification with a plain use of page_address(). 

Obviously, due to my little experience with kernel developing and less than 
little knowledge of this protocol I had to ask whether or not I was right in 
identifying the site of the allocations.

The reasons why I had to use page_address() will be clearer reading what 
follows...

> Yes, its the same code-path. Would be great if we still had an
> abstraction that would do the right thing regardless of highmem or
> not like kmap provides though.

It would be great and it is already possible (this is why Thomas Gleixner 
created this kmap_local_page() API) but here we have a huge issue. kmap() and 
kmap_atomic() have recently been deprecated and they shouldn't any longer be 
used in new code: https://lore.kernel.org/all/20220813220034.806698-1-ira.weiny@intel.com/ ("[PATCH] checkpatch: Add kmap and kmap_atomic to the 
deprecated list").

kmap_local_page() always does the right thing: users can call it with or 
without HIGHMEM enabled, in-atomic (also in interrupts) or in preemptible 
contexts, they can take page faults. 

It doesn't require global lock for synchronization and doesn't require global 
TLB invalidation when the kmap's pool wraps and doesn't block waiting for free 
slots. 

Nice, isn't it?

However, with nvmet-tcp we cannot easily use kmap_local_page() because it 
comes with a major problem: it's local to the thread. If users handed the 
kernel virtual addresses returned by this function to other threads, the 
pointers would be invalid.

Here kmap() and kunmap() call sites are in two different workqueues. 
Therefore, there is no way to convert kmap() to kmap_local_page(), unless this 
code is heavily refactored.

Knowing that the pages cannot come from Highmem avoids this refactoring and in 
the meantime it allows us to delete the kmap() and kunmap() calls sites.

> > You should also remove the cmd's 'nr_mapped' field while you're at it,
> > otherwise you'll hit the WARN in nvmet_tcp_free_cmd_buffers().
> 
> Not remove nr_mapped because we use it to know the iovec entries, but
> we can just remove the WARN statement.

Ah, OK. I'll take care of this too. That was not my first concern when I did 
the RFC. The "real" patch must also address this detail.

@Chaitanya:

Since this is a mere simplification and shorten of code, I suppose I can skip 
the performance tests. Ira and I have still hundreds of call sites with kmap() 
and kmap_atomic() which we should care of, therefore we prefer to leave alone 
everything that is not strictly necessary for the deprecated API deletions.

Thanks to you all,

Fabio



  reply	other threads:[~2022-08-17 12:03 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-16  9:18 [RFC PATCH] nvmet-tcp: Don't kmap() pages which can't come from HIGHMEM Fabio M. De Francesco
2022-08-16 13:12 ` Chaitanya Kulkarni
2022-08-16 18:16   ` Fabio M. De Francesco
2022-08-16 18:59 ` Keith Busch
2022-08-17  9:44   ` Sagi Grimberg
2022-08-17 12:02     ` Fabio M. De Francesco [this message]
2022-08-17 23:42       ` Chaitanya Kulkarni
2022-08-17 14:18     ` Keith Busch
2022-08-17 14:25       ` Sagi Grimberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3177256.oiGErgHkdL@opensuse \
    --to=fmdefrancesco@gmail.com \
    --cc=hch@lst.de \
    --cc=ira.weiny@intel.com \
    --cc=kbusch@kernel.org \
    --cc=kch@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox