All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Boaz Harrosh <boaz@plexistor.com>, Jan Kara <jack@suse.cz>,
	Mike Snitzer <snitzer@redhat.com>, Neil Brown <neilb@suse.de>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Chris Mason <clm@fb.com>, Paul Mackerras <paulus@samba.org>,
	"H. Peter Anvin" <hpa@zytor.com>, Christoph Hellwig <hch@lst.de>,
	Alasdair Kergon <agk@redhat.com>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	Mel Gorman <mgorman@suse.de>,
	Matthew Wilcox <willy@linux.intel.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Rik van Riel <riel@redhat.com>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Jens Axboe <axboe@kernel.dk>, Theodore Ts'o <tytso@mit.edu>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Julia Lawall <Julia.Lawall@lip6.fr>, Tejun Heo <tj@kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t
Date: Thu, 7 May 2015 16:42:25 +0200	[thread overview]
Message-ID: <20150507144225.GA20491@gmail.com> (raw)
In-Reply-To: <20150507090217.GA4467@gmail.com>


* Ingo Molnar <mingo@kernel.org> wrote:

> [...]
>
> For anything more complex, that maps any of this storage to 
> user-space, or exposes it to higher level struct page based APIs, 
> etc., where references matter and it's more of a cache with 
> potentially multiple users, not an IO space, the natural API is 
> struct page.

Let me walk back on this:

> I'd say that this particular series mostly addresses the 'pfn as 
> sector_t' side of the equation, where persistent memory is IO space, 
> not memory space, and as such it is the more natural and thus also 
> the cheaper/faster approach.

... but that does not appear to be the case: this series replaces a 
'struct page' interface with a pure pfn interface for the express 
purpose of being able to DMA to/from 'memory areas' that are not 
struct page backed.

> Linus probably disagrees? :-)

[ and he'd disagree rightfully ;-) ]

So what this patch set tries to achieve is (sector_t -> sector_t) IO 
between storage devices (i.e. a rare and somewhat weird usecase), and 
does it by squeezing one device's storage address into our formerly 
struct page backed descriptor, via a pfn.

That looks like a layering violation and a mistake to me. If we want 
to do direct (sector_t -> sector_t) IO, with no serialization worries, 
it should have its own (simple) API - which things like hierarchical 
RAID or RDMA APIs could use.

If what we want to do is to support say an mmap() of a file on 
persistent storage, and then read() into that file from another device 
via DMA, then I think we should have allocated struct page backing at 
mmap() time already, and all regular syscall APIs would 'just work' 
from that point on - far above what page-less, pfn-based APIs can do.

The temporary struct page backing can then be freed at munmap() time.

And if the usage is pure fd based, we don't really have fd-to-fd APIs 
beyond the rarely used splice variants (and even those don't do pure 
cross-IO, they use a pipe as an intermediary), so there's no problem 
to solve I suspect.

Thanks,

	Ingo

WARNING: multiple messages have this Message-ID (diff)
From: Ingo Molnar <mingo@kernel.org>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Boaz Harrosh <boaz@plexistor.com>, Jan Kara <jack@suse.cz>,
	Mike Snitzer <snitzer@redhat.com>, Neil Brown <neilb@suse.de>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Chris Mason <clm@fb.com>, Paul Mackerras <paulus@samba.org>,
	"H. Peter Anvin" <hpa@zytor.com>, Christoph Hellwig <hch@lst.de>,
	Alasdair Kergon <agk@redhat.com>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	Mel Gorman <mgorman@suse.de>,
	Matthew Wilcox <willy@linux.intel.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Rik van Riel <riel@redhat.com>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Jens Axboe <axboe@kernel.dk>, Theodore Ts'o <tytso@mit.edu>,
	"Martin K. Petersen" <martin.petersen@ora
Subject: Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t
Date: Thu, 7 May 2015 16:42:25 +0200	[thread overview]
Message-ID: <20150507144225.GA20491@gmail.com> (raw)
In-Reply-To: <20150507090217.GA4467@gmail.com>


* Ingo Molnar <mingo@kernel.org> wrote:

> [...]
>
> For anything more complex, that maps any of this storage to 
> user-space, or exposes it to higher level struct page based APIs, 
> etc., where references matter and it's more of a cache with 
> potentially multiple users, not an IO space, the natural API is 
> struct page.

Let me walk back on this:

> I'd say that this particular series mostly addresses the 'pfn as 
> sector_t' side of the equation, where persistent memory is IO space, 
> not memory space, and as such it is the more natural and thus also 
> the cheaper/faster approach.

... but that does not appear to be the case: this series replaces a 
'struct page' interface with a pure pfn interface for the express 
purpose of being able to DMA to/from 'memory areas' that are not 
struct page backed.

> Linus probably disagrees? :-)

[ and he'd disagree rightfully ;-) ]

So what this patch set tries to achieve is (sector_t -> sector_t) IO 
between storage devices (i.e. a rare and somewhat weird usecase), and 
does it by squeezing one device's storage address into our formerly 
struct page backed descriptor, via a pfn.

That looks like a layering violation and a mistake to me. If we want 
to do direct (sector_t -> sector_t) IO, with no serialization worries, 
it should have its own (simple) API - which things like hierarchical 
RAID or RDMA APIs could use.

If what we want to do is to support say an mmap() of a file on 
persistent storage, and then read() into that file from another device 
via DMA, then I think we should have allocated struct page backing at 
mmap() time already, and all regular syscall APIs would 'just work' 
from that point on - far above what page-less, pfn-based APIs can do.

The temporary struct page backing can then be freed at munmap() time.

And if the usage is pure fd based, we don't really have fd-to-fd APIs 
beyond the rarely used splice variants (and even those don't do pure 
cross-IO, they use a pipe as an intermediary), so there's no problem 
to solve I suspect.

Thanks,

	Ingo

WARNING: multiple messages have this Message-ID (diff)
From: Ingo Molnar <mingo@kernel.org>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Boaz Harrosh <boaz@plexistor.com>, Jan Kara <jack@suse.cz>,
	Mike Snitzer <snitzer@redhat.com>, Neil Brown <neilb@suse.de>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Chris Mason <clm@fb.com>, Paul Mackerras <paulus@samba.org>,
	"H. Peter Anvin" <hpa@zytor.com>, Christoph Hellwig <hch@lst.de>,
	Alasdair Kergon <agk@redhat.com>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>,
	Mel Gorman <mgorman@suse.de>,
	Matthew Wilcox <willy@linux.intel.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Rik van Riel <riel@redhat.com>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Jens Axboe <axboe@kernel.dk>, "Theodore Ts'o" <tytso@mit.edu>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Julia Lawall <Julia.Lawall@lip6.fr>, Tejun Heo <tj@kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t
Date: Thu, 7 May 2015 16:42:25 +0200	[thread overview]
Message-ID: <20150507144225.GA20491@gmail.com> (raw)
In-Reply-To: <20150507090217.GA4467@gmail.com>


* Ingo Molnar <mingo@kernel.org> wrote:

> [...]
>
> For anything more complex, that maps any of this storage to 
> user-space, or exposes it to higher level struct page based APIs, 
> etc., where references matter and it's more of a cache with 
> potentially multiple users, not an IO space, the natural API is 
> struct page.

Let me walk back on this:

> I'd say that this particular series mostly addresses the 'pfn as 
> sector_t' side of the equation, where persistent memory is IO space, 
> not memory space, and as such it is the more natural and thus also 
> the cheaper/faster approach.

... but that does not appear to be the case: this series replaces a 
'struct page' interface with a pure pfn interface for the express 
purpose of being able to DMA to/from 'memory areas' that are not 
struct page backed.

> Linus probably disagrees? :-)

[ and he'd disagree rightfully ;-) ]

So what this patch set tries to achieve is (sector_t -> sector_t) IO 
between storage devices (i.e. a rare and somewhat weird usecase), and 
does it by squeezing one device's storage address into our formerly 
struct page backed descriptor, via a pfn.

That looks like a layering violation and a mistake to me. If we want 
to do direct (sector_t -> sector_t) IO, with no serialization worries, 
it should have its own (simple) API - which things like hierarchical 
RAID or RDMA APIs could use.

If what we want to do is to support say an mmap() of a file on 
persistent storage, and then read() into that file from another device 
via DMA, then I think we should have allocated struct page backing at 
mmap() time already, and all regular syscall APIs would 'just work' 
from that point on - far above what page-less, pfn-based APIs can do.

The temporary struct page backing can then be freed at munmap() time.

And if the usage is pure fd based, we don't really have fd-to-fd APIs 
beyond the rarely used splice variants (and even those don't do pure 
cross-IO, they use a pipe as an intermediary), so there's no problem 
to solve I suspect.

Thanks,

	Ingo

  reply	other threads:[~2015-05-07 14:42 UTC|newest]

Thread overview: 180+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-06 20:04 [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t Dan Williams
2015-05-06 20:04 ` Dan Williams
2015-05-06 20:04 ` [PATCH v2 01/10] arch: introduce __pfn_t for persistent memory i/o Dan Williams
2015-05-06 20:04   ` Dan Williams
2015-05-07 14:55   ` Stephen Rothwell
2015-05-07 14:55     ` Stephen Rothwell
2015-05-08  0:21     ` Dan Williams
2015-05-08  0:21       ` Dan Williams
2015-05-06 20:05 ` [PATCH v2 02/10] block: add helpers for accessing a bio_vec page Dan Williams
2015-05-06 20:05   ` Dan Williams
2015-05-08 15:59   ` Dan Williams
2015-05-08 15:59     ` Dan Williams
2015-05-06 20:05 ` [PATCH v2 03/10] block: convert .bv_page to .bv_pfn bio_vec Dan Williams
2015-05-06 20:05   ` Dan Williams
2015-05-06 20:05 ` [PATCH v2 04/10] dma-mapping: allow archs to optionally specify a ->map_pfn() operation Dan Williams
2015-05-06 20:05   ` Dan Williams
2015-05-06 20:05 ` [PATCH v2 05/10] scatterlist: use sg_phys() Dan Williams
2015-05-06 20:05   ` Dan Williams
2015-05-06 20:05 ` [PATCH v2 06/10] scatterlist: support "page-less" (__pfn_t only) entries Dan Williams
2015-05-06 20:05   ` Dan Williams
2015-05-06 20:05 ` [PATCH v2 07/10] x86: support dma_map_pfn() Dan Williams
2015-05-06 20:05   ` Dan Williams
2015-05-06 20:05 ` [PATCH v2 08/10] x86: support kmap_atomic_pfn_t() for persistent memory Dan Williams
2015-05-06 20:05   ` Dan Williams
2015-05-06 20:20   ` [Linux-nvdimm] " Dan Williams
2015-05-06 20:20     ` Dan Williams
2015-05-06 20:05 ` [PATCH v2 09/10] dax: convert to __pfn_t Dan Williams
2015-05-06 20:05   ` Dan Williams
2015-05-06 20:05 ` [PATCH v2 10/10] block: base support for pfn i/o Dan Williams
2015-05-06 20:05   ` Dan Williams
2015-05-06 20:50 ` [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t Al Viro
2015-05-06 20:50   ` Al Viro
2015-05-06 22:10 ` Linus Torvalds
2015-05-06 22:10   ` Linus Torvalds
2015-05-06 22:10   ` Linus Torvalds
2015-05-06 23:47   ` Dan Williams
2015-05-06 23:47     ` Dan Williams
2015-05-06 23:47     ` Dan Williams
2015-05-07  0:19     ` Linus Torvalds
2015-05-07  0:19       ` Linus Torvalds
2015-05-07  0:19       ` Linus Torvalds
2015-05-07  2:36       ` Dan Williams
2015-05-07  2:36         ` Dan Williams
2015-05-07  2:36         ` Dan Williams
2015-05-07  9:02         ` Ingo Molnar
2015-05-07  9:02           ` Ingo Molnar
2015-05-07  9:02           ` Ingo Molnar
2015-05-07 14:42           ` Ingo Molnar [this message]
2015-05-07 14:42             ` Ingo Molnar
2015-05-07 14:42             ` Ingo Molnar
2015-05-07 15:52             ` Dan Williams
2015-05-07 15:52               ` Dan Williams
2015-05-07 15:52               ` Dan Williams
2015-05-07 17:52               ` Ingo Molnar
2015-05-07 17:52                 ` Ingo Molnar
2015-05-07 17:52                 ` Ingo Molnar
2015-05-07 15:00         ` Linus Torvalds
2015-05-07 15:00           ` Linus Torvalds
2015-05-07 15:00           ` Linus Torvalds
2015-05-07 15:40           ` Dan Williams
2015-05-07 15:40             ` Dan Williams
2015-05-07 15:40             ` Dan Williams
2015-05-07 15:58             ` Linus Torvalds
2015-05-07 15:58               ` Linus Torvalds
2015-05-07 15:58               ` Linus Torvalds
2015-05-07 16:03               ` Dan Williams
2015-05-07 16:03                 ` Dan Williams
2015-05-07 16:03                 ` Dan Williams
2015-05-07 17:36                 ` Ingo Molnar
2015-05-07 17:36                   ` Ingo Molnar
2015-05-07 17:36                   ` Ingo Molnar
2015-05-07 17:42                   ` Dan Williams
2015-05-07 17:42                     ` Dan Williams
2015-05-07 17:42                     ` Dan Williams
2015-05-07 17:56                     ` Dave Hansen
2015-05-07 17:56                       ` Dave Hansen
2015-05-07 17:56                       ` Dave Hansen
2015-05-07 19:11                       ` Ingo Molnar
2015-05-07 19:11                         ` Ingo Molnar
2015-05-07 19:11                         ` Ingo Molnar
2015-05-07 19:36                         ` Jerome Glisse
2015-05-07 19:36                           ` Jerome Glisse
2015-05-07 19:36                           ` Jerome Glisse
2015-05-07 19:48                           ` Ingo Molnar
2015-05-07 19:48                             ` Ingo Molnar
2015-05-07 19:48                             ` Ingo Molnar
2015-05-07 19:53                             ` Ingo Molnar
2015-05-07 19:53                               ` Ingo Molnar
2015-05-07 19:53                               ` Ingo Molnar
2015-05-07 20:18                               ` Jerome Glisse
2015-05-07 20:18                                 ` Jerome Glisse
2015-05-07 20:18                                 ` Jerome Glisse
2015-05-08  5:37                                 ` Ingo Molnar
2015-05-08  5:37                                   ` Ingo Molnar
2015-05-08  5:37                                   ` Ingo Molnar
2015-05-08  9:20                                   ` Al Viro
2015-05-08  9:20                                     ` Al Viro
2015-05-08  9:26                                     ` Ingo Molnar
2015-05-08  9:26                                       ` Ingo Molnar
2015-05-08 10:00                                       ` Al Viro
2015-05-08 10:00                                         ` Al Viro
2015-05-08 13:45                         ` Rik van Riel
2015-05-08 13:45                           ` Rik van Riel
2015-05-08 14:05                           ` Ingo Molnar
2015-05-08 14:05                             ` Ingo Molnar
2015-05-08 14:40                             ` John Stoffel
2015-05-08 14:40                               ` John Stoffel
2015-05-08 15:54                               ` Linus Torvalds
2015-05-08 15:54                                 ` Linus Torvalds
2015-05-08 16:28                                 ` Al Viro
2015-05-08 16:28                                   ` Al Viro
2015-05-08 16:59                                 ` Rik van Riel
2015-05-08 16:59                                   ` Rik van Riel
2015-05-09  1:14                                   ` Linus Torvalds
2015-05-09  1:14                                     ` Linus Torvalds
2015-05-09  3:02                                     ` Rik van Riel
2015-05-09  3:02                                       ` Rik van Riel
2015-05-09  3:52                                       ` Linus Torvalds
2015-05-09  3:52                                         ` Linus Torvalds
2015-05-09 21:56                                       ` Dave Chinner
2015-05-09 21:56                                         ` Dave Chinner
2015-05-09  8:45                                   ` "Directly mapped persistent memory page cache" Ingo Molnar
2015-05-09  8:45                                     ` Ingo Molnar
2015-05-09 15:51                                     ` Eric W. Biederman
2015-05-09 15:51                                       ` Eric W. Biederman
2015-05-10 10:07                                       ` Ingo Molnar
2015-05-10 10:07                                         ` Ingo Molnar
2015-05-09 18:24                                     ` Dan Williams
2015-05-09 18:24                                       ` Dan Williams
2015-05-10  9:46                                       ` Ingo Molnar
2015-05-10  9:46                                         ` Ingo Molnar
2015-05-10 17:29                                         ` Dan Williams
2015-05-10 17:29                                           ` Dan Williams
2015-05-11  8:25                                     ` Dave Chinner
2015-05-11  8:25                                       ` Dave Chinner
2015-05-11  9:18                                       ` Ingo Molnar
2015-05-11  9:18                                         ` Ingo Molnar
2015-05-11 10:12                                         ` Zuckerman, Boris
2015-05-11 10:12                                           ` Zuckerman, Boris
2015-05-11 10:38                                           ` Ingo Molnar
2015-05-11 10:38                                             ` Ingo Molnar
2015-05-11 14:51                                             ` Jeff Moyer
2015-05-11 14:51                                               ` Jeff Moyer
2015-05-12  0:53                                         ` Dave Chinner
2015-05-12  0:53                                           ` Dave Chinner
2015-05-12 14:47                                           ` Jerome Glisse
2015-05-12 14:47                                             ` Jerome Glisse
2015-05-12 14:47                                             ` Jerome Glisse
2015-06-05  5:43                                             ` Dan Williams
2015-06-05  5:43                                               ` Dan Williams
2015-05-11 14:31                                     ` Matthew Wilcox
2015-05-11 14:31                                       ` Matthew Wilcox
2015-05-11 20:01                                       ` Jerome Glisse
2015-05-11 20:01                                         ` Jerome Glisse
2015-05-11 20:01                                         ` Jerome Glisse
2015-05-08 20:40                                 ` [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t John Stoffel
2015-05-08 20:40                                   ` John Stoffel
2015-05-08 14:54                             ` Rik van Riel
2015-05-08 14:54                               ` Rik van Riel
2015-05-07 17:43                 ` Linus Torvalds
2015-05-07 17:43                   ` Linus Torvalds
2015-05-07 17:43                   ` Linus Torvalds
2015-05-07 20:06                   ` Dan Williams
2015-05-07 20:06                     ` Dan Williams
2015-05-07 20:06                     ` Dan Williams
2015-05-07 16:18       ` Christoph Hellwig
2015-05-07 16:18         ` Christoph Hellwig
2015-05-07 16:18         ` Christoph Hellwig
2015-05-07 16:41         ` Dan Williams
2015-05-07 16:41           ` Dan Williams
2015-05-07 16:41           ` Dan Williams
2015-05-07 18:40           ` Ingo Molnar
2015-05-07 18:40             ` Ingo Molnar
2015-05-07 18:40             ` Ingo Molnar
2015-05-07 19:44             ` Dan Williams
2015-05-07 19:44               ` Dan Williams
2015-05-07 19:44               ` Dan Williams
2015-05-07 17:30         ` Jerome Glisse
2015-05-07 17:30           ` Jerome Glisse
2015-05-07 17:30           ` Jerome Glisse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150507144225.GA20491@gmail.com \
    --to=mingo@kernel.org \
    --cc=Julia.Lawall@lip6.fr \
    --cc=agk@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=benh@kernel.crashing.org \
    --cc=boaz@plexistor.com \
    --cc=clm@fb.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=hch@lst.de \
    --cc=heiko.carstens@de.ibm.com \
    --cc=hpa@zytor.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=martin.petersen@oracle.com \
    --cc=mgorman@suse.de \
    --cc=neilb@suse.de \
    --cc=paulus@samba.org \
    --cc=riel@redhat.com \
    --cc=ross.zwisler@linux.intel.com \
    --cc=schwidefsky@de.ibm.com \
    --cc=snitzer@redhat.com \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.