public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: Matthew Wilcox <willy@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>,
	Kundan Kumar <kundan.kumar@samsung.com>,
	axboe@kernel.dk, linux-block@vger.kernel.org,
	joshi.k@samsung.com, mcgrof@kernel.org, anuj20.g@samsung.com,
	nj.shetty@samsung.com, c.gameti@samsung.com,
	gost.dev@samsung.com
Subject: Re: [PATCH v2] block : add larger order folio size instead of pages
Date: Sun, 5 May 2024 14:10:14 +0200	[thread overview]
Message-ID: <33717b97-8986-4d6e-aa10-47393b810ea2@suse.de> (raw)
In-Reply-To: <ZjZjBHAdUdt6FJe6@casper.infradead.org>

On 5/4/24 18:32, Matthew Wilcox wrote:
> On Sat, May 04, 2024 at 02:35:15PM +0200, Hannes Reinecke wrote:
>>> I think this is wandering into a minefield.  I'm pretty sure
>>> it's considered valid to split the bio, and complete the two halves
>>> independently.  Each one will put the refcounts for the pages it touches,
>>> and if we do this early putting of references, that's going to fail.
>>
>> Precisesly my worries. Something I want to talk to you about at LSF;
>> refcounting of folios vs refcounting of pages.
>> When one takes a refcount on a folio we are actually taking a refcount
>> on the first page, which is okay if we stick with using the folio throughout
>> the call chain. But if we start mixing between pages and folios (as we do
>> here) we will be getting the refcount wrong.
>>
>> Do you have plans how we could improve the situation?
>> Like a warning 'Hey, you've used the folio for taking the reference, but now
>> you are releasing the references for the page'?
> 
> This is a fairly common misunderstanding, but TLDR: problem solved long
> before I started this project.
> 
> Individual pages don't actually have a refcount.  I know it looks
> like they do, and they kind of do, but for tail pages, the refcount is
> always 0.  Functions like get_page() and put_page() always operate on
> the head page (ie folio) refcount.
> 
Precisely.

> Specifically, I think you're concerned about pages coming from GUP.
> Take a look at try_get_folio().  We pass in a struct page, explicitly
> get the refcount on a folio, check the page is still part of the
> folio, then return the folio.  And we return the page to the caller
> because the caller needs to know the precise page at that address,
> not the folio that contains it.
> 
> There are functions which don't surreptitiously call compound_head()
> behind your back.  set_page_count(), for example.  And page_ref_count()
> (rather than the more normal page_count()).
> 
> And none of this is true if you don't use __GFP_COMP.  But let's call
> that an aberration that must die.

Ah, right. So the refcount for a page is always unwound to use the 
refcount of the enclosing folio.

I was actually concerned with the iov_iter functions, where we take a 
reference for each page. Currently iov_iter is iterating in units of
PAGE_SIZE, so there is no easy way of converting that to folios.

But one step at a time, I guess. First get the blocksize > pagesize 
patches in.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


      reply	other threads:[~2024-05-05 12:10 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20240430175735epcas5p103ac74e1482eda3e393c0034cea8e9ff@epcas5p1.samsung.com>
2024-04-30 17:50 ` [PATCH v2] block : add larger order folio size instead of pages Kundan Kumar
2024-05-02  5:35   ` Christoph Hellwig
2024-05-07 11:19     ` Kundan Kumar
2024-05-02  6:45   ` Hannes Reinecke
2024-05-02 11:52     ` Kundan Kumar
2024-05-02 12:53     ` [PATCH v2] " Christoph Hellwig
2024-05-03 15:26       ` Matthew Wilcox
2024-05-03 16:22         ` Christoph Hellwig
2024-05-04 12:35         ` Hannes Reinecke
2024-05-04 16:32           ` Matthew Wilcox
2024-05-05 12:10             ` Hannes Reinecke [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=33717b97-8986-4d6e-aa10-47393b810ea2@suse.de \
    --to=hare@suse.de \
    --cc=anuj20.g@samsung.com \
    --cc=axboe@kernel.dk \
    --cc=c.gameti@samsung.com \
    --cc=gost.dev@samsung.com \
    --cc=hch@lst.de \
    --cc=joshi.k@samsung.com \
    --cc=kundan.kumar@samsung.com \
    --cc=linux-block@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=nj.shetty@samsung.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox