From: Jan Kara <jack@suse.cz>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: "Jan Kara" <jack@suse.cz>, "Michal Suchánek" <msuchanek@suse.de>,
jack@suse.de, linuxppc-dev@lists.ozlabs.org, mpe@ellerman.id.au,
linux-nvdimm@lists.01.org
Subject: Re: [RFC PATCH 1/2] libnvdimm: Add prctl control for disabling synchronous fault support.
Date: Mon, 1 Jun 2020 12:09:25 +0200 [thread overview]
Message-ID: <20200601100925.GC3960@quack2.suse.cz> (raw)
In-Reply-To: <7e8ee9e3-4d4d-e4b9-913b-1c2448adc62a@linux.ibm.com>
On Fri 29-05-20 16:25:35, Aneesh Kumar K.V wrote:
> On 5/29/20 3:22 PM, Jan Kara wrote:
> > On Fri 29-05-20 15:07:31, Aneesh Kumar K.V wrote:
> > > Thanks Michal. I also missed Jeff in this email thread.
> >
> > And I think you'll also need some of the sched maintainers for the prctl
> > bits...
> >
> > > On 5/29/20 3:03 PM, Michal Suchánek wrote:
> > > > Adding Jan
> > > >
> > > > On Fri, May 29, 2020 at 11:11:39AM +0530, Aneesh Kumar K.V wrote:
> > > > > With POWER10, architecture is adding new pmem flush and sync instructions.
> > > > > The kernel should prevent the usage of MAP_SYNC if applications are not using
> > > > > the new instructions on newer hardware.
> > > > >
> > > > > This patch adds a prctl option MAP_SYNC_ENABLE that can be used to enable
> > > > > the usage of MAP_SYNC. The kernel config option is added to allow the user
> > > > > to control whether MAP_SYNC should be enabled by default or not.
> > > > >
> > > > > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> > ...
> > > > > diff --git a/kernel/fork.c b/kernel/fork.c
> > > > > index 8c700f881d92..d5a9a363e81e 100644
> > > > > --- a/kernel/fork.c
> > > > > +++ b/kernel/fork.c
> > > > > @@ -963,6 +963,12 @@ __cacheline_aligned_in_smp DEFINE_SPINLOCK(mmlist_lock);
> > > > > static unsigned long default_dump_filter = MMF_DUMP_FILTER_DEFAULT;
> > > > > +#ifdef CONFIG_ARCH_MAP_SYNC_DISABLE
> > > > > +unsigned long default_map_sync_mask = MMF_DISABLE_MAP_SYNC_MASK;
> > > > > +#else
> > > > > +unsigned long default_map_sync_mask = 0;
> > > > > +#endif
> > > > > +
> >
> > I'm not sure CONFIG is really the right approach here. For a distro that would
> > basically mean to disable MAP_SYNC for all PPC kernels unless application
> > explicitly uses the right prctl. Shouldn't we rather initialize
> > default_map_sync_mask on boot based on whether the CPU we run on requires
> > new flush instructions or not? Otherwise the patch looks sensible.
> >
>
> yes that is correct. We ideally want to deny MAP_SYNC only w.r.t POWER10.
> But on a virtualized platform there is no easy way to detect that. We could
> ideally hook this into the nvdimm driver where we look at the new compat
> string ibm,persistent-memory-v2 and then disable MAP_SYNC
> if we find a device with the specific value.
Hum, couldn't we set some flag for nvdimm devices with
"ibm,persistent-memory-v2" property and then check it during mmap(2) time
and when the device has this propery and the mmap(2) caller doesn't have
the prctl set, we'd disallow MAP_SYNC? That should make things mostly
seamless, shouldn't it? Only apps that want to use MAP_SYNC on these
devices would need to use prctl(MMF_DISABLE_MAP_SYNC, 0) but then these
applications need to be aware of new instructions so this isn't that much
additional burden...
> With that I am wondering should we even have this patch? Can we expect
> userspace get updated to use new instruction?.
>
> With ppc64 we never had a real persistent memory device available for end
> user to try. The available persistent memory stack was using vPMEM which was
> presented as a volatile memory region for which there is no need to use any
> of the flush instructions. We could safely assume that as we get
> applications certified/verified for working with pmem device on ppc64, they
> would all be using the new instructions?
This is a bit of a gamble... I don't have too much trust in certification /
verification because only the "big players" may do powerfail testing
throughout enough that they'd uncover these problems. So the question
really is: How many apps are out there using MAP_SYNC on ppc64? Hopefully
not many given the HW didn't ship yet as you wrote but I have no real clue.
Similarly there's a question: How many app writers will read manual for
older ppc64 architecture and write apps that won't work reliably on
POWER10? Again, I have no idea.
So the prctl would be IMHO a nice safety belt but I'm not 100% certain it
will be needed...
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org
WARNING: multiple messages have this Message-ID (diff)
From: Jan Kara <jack@suse.cz>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: "Jan Kara" <jack@suse.cz>,
linux-nvdimm@lists.01.org, jack@suse.de,
"Jeff Moyer" <jmoyer@redhat.com>,
oohall@gmail.com, dan.j.williams@intel.com,
"Michal Suchánek" <msuchanek@suse.de>,
linuxppc-dev@lists.ozlabs.org
Subject: Re: [RFC PATCH 1/2] libnvdimm: Add prctl control for disabling synchronous fault support.
Date: Mon, 1 Jun 2020 12:09:25 +0200 [thread overview]
Message-ID: <20200601100925.GC3960@quack2.suse.cz> (raw)
In-Reply-To: <7e8ee9e3-4d4d-e4b9-913b-1c2448adc62a@linux.ibm.com>
On Fri 29-05-20 16:25:35, Aneesh Kumar K.V wrote:
> On 5/29/20 3:22 PM, Jan Kara wrote:
> > On Fri 29-05-20 15:07:31, Aneesh Kumar K.V wrote:
> > > Thanks Michal. I also missed Jeff in this email thread.
> >
> > And I think you'll also need some of the sched maintainers for the prctl
> > bits...
> >
> > > On 5/29/20 3:03 PM, Michal Suchánek wrote:
> > > > Adding Jan
> > > >
> > > > On Fri, May 29, 2020 at 11:11:39AM +0530, Aneesh Kumar K.V wrote:
> > > > > With POWER10, architecture is adding new pmem flush and sync instructions.
> > > > > The kernel should prevent the usage of MAP_SYNC if applications are not using
> > > > > the new instructions on newer hardware.
> > > > >
> > > > > This patch adds a prctl option MAP_SYNC_ENABLE that can be used to enable
> > > > > the usage of MAP_SYNC. The kernel config option is added to allow the user
> > > > > to control whether MAP_SYNC should be enabled by default or not.
> > > > >
> > > > > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> > ...
> > > > > diff --git a/kernel/fork.c b/kernel/fork.c
> > > > > index 8c700f881d92..d5a9a363e81e 100644
> > > > > --- a/kernel/fork.c
> > > > > +++ b/kernel/fork.c
> > > > > @@ -963,6 +963,12 @@ __cacheline_aligned_in_smp DEFINE_SPINLOCK(mmlist_lock);
> > > > > static unsigned long default_dump_filter = MMF_DUMP_FILTER_DEFAULT;
> > > > > +#ifdef CONFIG_ARCH_MAP_SYNC_DISABLE
> > > > > +unsigned long default_map_sync_mask = MMF_DISABLE_MAP_SYNC_MASK;
> > > > > +#else
> > > > > +unsigned long default_map_sync_mask = 0;
> > > > > +#endif
> > > > > +
> >
> > I'm not sure CONFIG is really the right approach here. For a distro that would
> > basically mean to disable MAP_SYNC for all PPC kernels unless application
> > explicitly uses the right prctl. Shouldn't we rather initialize
> > default_map_sync_mask on boot based on whether the CPU we run on requires
> > new flush instructions or not? Otherwise the patch looks sensible.
> >
>
> yes that is correct. We ideally want to deny MAP_SYNC only w.r.t POWER10.
> But on a virtualized platform there is no easy way to detect that. We could
> ideally hook this into the nvdimm driver where we look at the new compat
> string ibm,persistent-memory-v2 and then disable MAP_SYNC
> if we find a device with the specific value.
Hum, couldn't we set some flag for nvdimm devices with
"ibm,persistent-memory-v2" property and then check it during mmap(2) time
and when the device has this propery and the mmap(2) caller doesn't have
the prctl set, we'd disallow MAP_SYNC? That should make things mostly
seamless, shouldn't it? Only apps that want to use MAP_SYNC on these
devices would need to use prctl(MMF_DISABLE_MAP_SYNC, 0) but then these
applications need to be aware of new instructions so this isn't that much
additional burden...
> With that I am wondering should we even have this patch? Can we expect
> userspace get updated to use new instruction?.
>
> With ppc64 we never had a real persistent memory device available for end
> user to try. The available persistent memory stack was using vPMEM which was
> presented as a volatile memory region for which there is no need to use any
> of the flush instructions. We could safely assume that as we get
> applications certified/verified for working with pmem device on ppc64, they
> would all be using the new instructions?
This is a bit of a gamble... I don't have too much trust in certification /
verification because only the "big players" may do powerfail testing
throughout enough that they'd uncover these problems. So the question
really is: How many apps are out there using MAP_SYNC on ppc64? Hopefully
not many given the HW didn't ship yet as you wrote but I have no real clue.
Similarly there's a question: How many app writers will read manual for
older ppc64 architecture and write apps that won't work reliably on
POWER10? Again, I have no idea.
So the prctl would be IMHO a nice safety belt but I'm not 100% certain it
will be needed...
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
next prev parent reply other threads:[~2020-06-01 10:09 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-29 5:41 [RFC PATCH 1/2] libnvdimm: Add prctl control for disabling synchronous fault support Aneesh Kumar K.V
2020-05-29 5:41 ` Aneesh Kumar K.V
2020-05-29 5:41 ` [RFC PATCH 2/2] powerpc/pmem: Disable synchronous fault by default Aneesh Kumar K.V
2020-05-29 5:41 ` Aneesh Kumar K.V
2020-05-29 9:33 ` [RFC PATCH 1/2] libnvdimm: Add prctl control for disabling synchronous fault support Michal Suchánek
2020-05-29 9:33 ` Michal Suchánek
2020-05-29 9:37 ` Aneesh Kumar K.V
2020-05-29 9:37 ` Aneesh Kumar K.V
2020-05-29 9:52 ` Jan Kara
2020-05-29 9:52 ` Jan Kara
2020-05-29 10:55 ` Aneesh Kumar K.V
2020-05-29 10:55 ` Aneesh Kumar K.V
2020-05-29 19:22 ` Dan Williams
2020-05-29 19:22 ` Dan Williams
2020-05-30 7:18 ` Aneesh Kumar K.V
2020-05-30 7:18 ` Aneesh Kumar K.V
2020-05-30 16:35 ` Dan Williams
2020-05-30 16:35 ` Dan Williams
2020-06-01 9:50 ` Jan Kara
2020-06-01 9:50 ` Jan Kara
2020-06-02 17:59 ` Williams, Dan J
2020-06-02 17:59 ` Williams, Dan J
2020-06-03 8:26 ` Jan Kara
2020-06-03 8:26 ` Jan Kara
2020-06-03 9:09 ` Aneesh Kumar K.V
2020-06-03 9:09 ` Aneesh Kumar K.V
2020-06-08 7:42 ` Aneesh Kumar K.V
2020-06-08 7:42 ` Aneesh Kumar K.V
2020-06-01 10:09 ` Jan Kara [this message]
2020-06-01 10:09 ` Jan Kara
2020-06-01 12:01 ` Aneesh Kumar K.V
2020-06-01 12:01 ` Aneesh Kumar K.V
2020-06-01 12:07 ` Michal Suchánek
2020-06-01 12:07 ` Michal Suchánek
2020-06-01 12:20 ` Aneesh Kumar K.V
2020-06-01 12:20 ` Aneesh Kumar K.V
2020-06-02 7:57 ` Aneesh Kumar K.V
2020-06-02 7:57 ` Aneesh Kumar K.V
2020-06-01 14:56 ` Jan Kara
2020-06-01 14:56 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200601100925.GC3960@quack2.suse.cz \
--to=jack@suse.cz \
--cc=aneesh.kumar@linux.ibm.com \
--cc=jack@suse.de \
--cc=linux-nvdimm@lists.01.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mpe@ellerman.id.au \
--cc=msuchanek@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.