Re: [Lsf-pc] [LSF/MM TOPIC] Un-addressable device memory and block/fs implications

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jan Kara <jack@suse.cz>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Jan Kara <jack@suse.cz>, Jerome Glisse <jglisse@redhat.com>,
	Dave Chinner <david@fromorbit.com>,
	linux-block@vger.kernel.org, linux-mm@kvack.org,
	lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org
Subject: Re: [Lsf-pc] [LSF/MM TOPIC] Un-addressable device memory and block/fs implications
Date: Mon, 19 Dec 2016 09:46:57 +0100	[thread overview]
Message-ID: <20161219084657.GA17598@quack2.suse.cz> (raw)
In-Reply-To: <87oa0cwoup.fsf@linux.vnet.ibm.com>

On Fri 16-12-16 08:40:38, Aneesh Kumar K.V wrote:
> Jan Kara <jack@suse.cz> writes:
> 
> > On Wed 14-12-16 12:15:14, Jerome Glisse wrote:
> > <snipped explanation that the device has the same cabilities as CPUs wrt
> > page handling>
> >
> >> > So won't it be easier to leave the pagecache page where it is and *copy* it
> >> > to the device? Can the device notify us *before* it is going to modify a
> >> > page, not just after it has modified it? Possibly if we just give it the
> >> > page read-only and it will have to ask CPU to get write permission? If yes,
> >> > then I belive this could work and even fs support should be doable.
> >> 
> >> Well yes and no. Device obey the same rule as CPU so if a file back page is
> >> map read only in the process it must first do a write fault which will call
> >> in the fs (page_mkwrite() of vm_ops). But once a page has write permission
> >> there is no way to be notify by hardware on every write. First the hardware
> >> do not have the capability. Second we are talking thousand (10 000 is upper
> >> range in today device) of concurrent thread, each can possibly write to page
> >> under consideration.
> >
> > Sure, I meant whether the device is able to do equivalent of ->page_mkwrite
> > notification which apparently it is. OK.
> >
> >> We really want the device page to behave just like regular page. Most fs code
> >> path never map file content, it only happens during read/write and i believe
> >> this can be handled either by migrating back or by using bounce page. I want
> >> to provide the choice between the two solutions as one will be better for some
> >> workload and the other for different workload.
> >
> > I agree with keeping page used by the device behaving as similar as
> > possible as any other page. I'm just exploring different possibilities how
> > to make that happen. E.g. the scheme I was aiming at is:
> >
> > When you want page A to be used by the device, you set up page A' in the
> > device but make sure any access to it will fault.
> >
> > When the device wants to access A', it notifies the CPU, that writeprotects
> > all mappings of A, copy A to A' and map A' read-only for the device.
> 
> 
> A and A' will have different pfns here and hence different struct page.

Yes. In fact I don't think there's need to have struct page for A' in my
scheme. At least for the purposes of page cache tracking... Maybe there's
good reason to have it from a device driver POV.

> So what will be there in the address_space->page_tree ? If we place
> A' in the page cache, then we are essentially bringing lot of locking
> complexity Dave talked about in previous mails.

No, I meant page A will stay in the page_tree. There's no need for
migration in my scheme.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

WARNING: multiple messages have this Message-ID (diff)

From: Jan Kara <jack@suse.cz>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Jan Kara <jack@suse.cz>, Jerome Glisse <jglisse@redhat.com>,
	Dave Chinner <david@fromorbit.com>,
	linux-block@vger.kernel.org, linux-mm@kvack.org,
	lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org
Subject: Re: [Lsf-pc] [LSF/MM TOPIC] Un-addressable device memory and block/fs implications
Date: Mon, 19 Dec 2016 09:46:57 +0100	[thread overview]
Message-ID: <20161219084657.GA17598@quack2.suse.cz> (raw)
In-Reply-To: <87oa0cwoup.fsf@linux.vnet.ibm.com>

On Fri 16-12-16 08:40:38, Aneesh Kumar K.V wrote:
> Jan Kara <jack@suse.cz> writes:
> 
> > On Wed 14-12-16 12:15:14, Jerome Glisse wrote:
> > <snipped explanation that the device has the same cabilities as CPUs wrt
> > page handling>
> >
> >> > So won't it be easier to leave the pagecache page where it is and *copy* it
> >> > to the device? Can the device notify us *before* it is going to modify a
> >> > page, not just after it has modified it? Possibly if we just give it the
> >> > page read-only and it will have to ask CPU to get write permission? If yes,
> >> > then I belive this could work and even fs support should be doable.
> >> 
> >> Well yes and no. Device obey the same rule as CPU so if a file back page is
> >> map read only in the process it must first do a write fault which will call
> >> in the fs (page_mkwrite() of vm_ops). But once a page has write permission
> >> there is no way to be notify by hardware on every write. First the hardware
> >> do not have the capability. Second we are talking thousand (10 000 is upper
> >> range in today device) of concurrent thread, each can possibly write to page
> >> under consideration.
> >
> > Sure, I meant whether the device is able to do equivalent of ->page_mkwrite
> > notification which apparently it is. OK.
> >
> >> We really want the device page to behave just like regular page. Most fs code
> >> path never map file content, it only happens during read/write and i believe
> >> this can be handled either by migrating back or by using bounce page. I want
> >> to provide the choice between the two solutions as one will be better for some
> >> workload and the other for different workload.
> >
> > I agree with keeping page used by the device behaving as similar as
> > possible as any other page. I'm just exploring different possibilities how
> > to make that happen. E.g. the scheme I was aiming at is:
> >
> > When you want page A to be used by the device, you set up page A' in the
> > device but make sure any access to it will fault.
> >
> > When the device wants to access A', it notifies the CPU, that writeprotects
> > all mappings of A, copy A to A' and map A' read-only for the device.
> 
> 
> A and A' will have different pfns here and hence different struct page.

Yes. In fact I don't think there's need to have struct page for A' in my
scheme. At least for the purposes of page cache tracking... Maybe there's
good reason to have it from a device driver POV.

> So what will be there in the address_space->page_tree ? If we place
> A' in the page cache, then we are essentially bringing lot of locking
> complexity Dave talked about in previous mails.

No, I meant page A will stay in the page_tree. There's no need for
migration in my scheme.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2016-12-19  8:46 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-13 18:15 [LSF/MM TOPIC] Un-addressable device memory and block/fs implications Jerome Glisse
2016-12-13 18:15 ` Jerome Glisse
2016-12-13 18:15 ` Jerome Glisse
2016-12-13 18:20 ` James Bottomley
2016-12-13 18:20   ` James Bottomley
2016-12-13 18:20   ` James Bottomley
2016-12-13 18:55   ` Jerome Glisse
2016-12-13 18:55     ` Jerome Glisse
2016-12-13 18:55     ` Jerome Glisse
2016-12-13 20:01     ` James Bottomley
2016-12-13 20:01       ` James Bottomley
2016-12-13 20:22       ` Jerome Glisse
2016-12-13 20:22         ` Jerome Glisse
2016-12-13 20:22         ` Jerome Glisse
2016-12-13 20:27       ` Dave Hansen
2016-12-13 20:27         ` Dave Hansen
2016-12-13 20:15 ` Dave Chinner
2016-12-13 20:15   ` Dave Chinner
2016-12-13 20:31   ` Jerome Glisse
2016-12-13 20:31     ` Jerome Glisse
2016-12-13 20:31     ` Jerome Glisse
2016-12-13 21:10     ` Dave Chinner
2016-12-13 21:10       ` Dave Chinner
2016-12-13 21:24       ` Jerome Glisse
2016-12-13 21:24         ` Jerome Glisse
2016-12-13 21:24         ` Jerome Glisse
2016-12-13 22:08         ` Dave Hansen
2016-12-13 22:08           ` Dave Hansen
2016-12-13 23:02           ` Jerome Glisse
2016-12-13 23:02             ` Jerome Glisse
2016-12-13 23:02             ` Jerome Glisse
2016-12-13 22:13         ` Dave Chinner
2016-12-13 22:13           ` Dave Chinner
2016-12-13 22:55           ` Jerome Glisse
2016-12-13 22:55             ` Jerome Glisse
2016-12-13 22:55             ` Jerome Glisse
2016-12-14  0:14             ` Dave Chinner
2016-12-14  0:14               ` Dave Chinner
2016-12-14  1:07               ` Jerome Glisse
2016-12-14  1:07                 ` Jerome Glisse
2016-12-14  1:07                 ` Jerome Glisse
2016-12-14  4:23                 ` Dave Chinner
2016-12-14  4:23                   ` Dave Chinner
2016-12-14 16:35                   ` Jerome Glisse
2016-12-14 16:35                     ` Jerome Glisse
2016-12-14 16:35                     ` Jerome Glisse
2016-12-14 11:13         ` [Lsf-pc] " Jan Kara
2016-12-14 11:13           ` Jan Kara
2016-12-14 17:15           ` Jerome Glisse
2016-12-14 17:15             ` Jerome Glisse
2016-12-14 17:15             ` Jerome Glisse
2016-12-15 16:19             ` Jan Kara
2016-12-15 16:19               ` Jan Kara
2016-12-15 19:14               ` Jerome Glisse
2016-12-15 19:14                 ` Jerome Glisse
2016-12-15 19:14                 ` Jerome Glisse
2016-12-16  8:14                 ` Jan Kara
2016-12-16  8:14                   ` Jan Kara
2016-12-16  3:10               ` Aneesh Kumar K.V
2016-12-16  3:10                 ` Aneesh Kumar K.V
2016-12-16  3:10                 ` Aneesh Kumar K.V
2016-12-19  8:46                 ` Jan Kara [this message]
2016-12-19  8:46                   ` Jan Kara
2016-12-19 17:00           ` Aneesh Kumar K.V
2016-12-19 17:00             ` Aneesh Kumar K.V
2016-12-14  3:55 ` Balbir Singh
2016-12-14  3:55   ` Balbir Singh
2016-12-16  3:14 ` [LSF/MM ATTEND] " Aneesh Kumar K.V
2016-12-16  3:14   ` Aneesh Kumar K.V
2017-01-16 12:04   ` Anshuman Khandual
2017-01-16 12:04     ` Anshuman Khandual
2017-01-16 23:15     ` John Hubbard
2017-01-16 23:15       ` John Hubbard
2017-01-18 11:00   ` [Lsf-pc] " Jan Kara
2017-01-18 11:00     ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161219084657.GA17598@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=david@fromorbit.com \
    --cc=jglisse@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.