* Re: [PATCH net-next v11 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Michael S. Tsirkin @ 2018-05-22 16:52 UTC (permalink / raw)
To: Jiri Pirko
Cc: alexander.h.duyck, virtio-dev, kubakici, Sridhar Samudrala,
virtualization, loseweigh, netdev, anjali.singhai, aaron.f.brown,
davem
In-Reply-To: <20180522154501.GL2149@nanopsycho>
On Tue, May 22, 2018 at 05:45:01PM +0200, Jiri Pirko wrote:
> Tue, May 22, 2018 at 05:32:30PM CEST, mst@redhat.com wrote:
> >On Tue, May 22, 2018 at 05:13:43PM +0200, Jiri Pirko wrote:
> >> Tue, May 22, 2018 at 03:39:33PM CEST, mst@redhat.com wrote:
> >> >On Tue, May 22, 2018 at 03:26:26PM +0200, Jiri Pirko wrote:
> >> >> Tue, May 22, 2018 at 03:17:37PM CEST, mst@redhat.com wrote:
> >> >> >On Tue, May 22, 2018 at 03:14:22PM +0200, Jiri Pirko wrote:
> >> >> >> Tue, May 22, 2018 at 03:12:40PM CEST, mst@redhat.com wrote:
> >> >> >> >On Tue, May 22, 2018 at 11:08:53AM +0200, Jiri Pirko wrote:
> >> >> >> >> Tue, May 22, 2018 at 11:06:37AM CEST, jiri@resnulli.us wrote:
> >> >> >> >> >Tue, May 22, 2018 at 04:06:18AM CEST, sridhar.samudrala@intel.com wrote:
> >> >> >> >> >>Use the registration/notification framework supported by the generic
> >> >> >> >> >>failover infrastructure.
> >> >> >> >> >>
> >> >> >> >> >>Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> >> >> >> >> >
> >> >> >> >> >In previous patchset versions, the common code did
> >> >> >> >> >netdev_rx_handler_register() and netdev_upper_dev_link() etc
> >> >> >> >> >(netvsc_vf_join()). Now, this is still done in netvsc. Why?
> >> >> >> >> >
> >> >> >> >> >This should be part of the common "failover" code.
> >> >> >> >> >
> >> >> >> >>
> >> >> >> >> Also note that in the current patchset you use IFF_FAILOVER flag for
> >> >> >> >> master, yet for the slave you use IFF_SLAVE. That is wrong.
> >> >> >> >> IFF_FAILOVER_SLAVE should be used.
> >> >> >> >
> >> >> >> >Or drop IFF_FAILOVER_SLAVE and set both IFF_FAILOVER and IFF_SLAVE?
> >> >> >>
> >> >> >> No. IFF_SLAVE is for bonding.
> >> >> >
> >> >> >What breaks if we reuse it for failover?
> >> >>
> >> >> This is exposed to userspace. IFF_SLAVE is expected for bonding slaves.
> >> >> And failover slave is not a bonding slave.
> >> >
> >> >That does not really answer the question. I'd claim it's sufficiently
> >> >like a bond slave for IFF_SLAVE to make sense.
> >> >
> >> >In fact you will find that netvsc already sets IFF_SLAVE, and so
> >>
> >> netvsc does the whole failover thing in a wrong way. This patchset is
> >> trying to fix it.
> >
> >Maybe, but we don't need gratuitous changes either, especially if they
> >break userspace.
>
> What do you mean by the "break"? It was a mistake to reuse IFF_SLAVE at
> the first place, lets fix it. If some userspace depends on that flag, it
> is broken anyway.
>
>
> >
> >> >does e.g. the eql driver.
> >> >
> >> >The advantage of using IFF_SLAVE is that userspace knows to skip it. If
> >>
> >> The userspace should know how to skip other types of slaves - team,
> >> bridge, ovs, etc.
> >> The "master link" should be the one to look at.
> >>
> >
> >How should existing userspace know which ones to skip and which one is
> >the master? Right now userspace seems to assume whatever does not have
> >IFF_SLAVE should be looked at. Are you saying that's not the right thing
>
> Why do you say so? What do you mean by "looked at"? Certainly not.
> IFLA_MASTER is the attribute that should be looked at, nothing else.
>
>
> >to do and userspace should be fixed? What should userspace do in
> >your opinion that will be forward compatible with future kernels?
> >
> >>
> >> >we don't set IFF_SLAVE existing userspace tries to use the lowerdev.
> >>
> >> Each master type has a IFF_ master flag and IFF_ slave flag.
> >
> >Could you give some examples please?
>
> enum netdev_priv_flags {
> IFF_EBRIDGE = 1<<1,
> IFF_BRIDGE_PORT = 1<<9,
> IFF_OPENVSWITCH = 1<<20,
> IFF_OVS_DATAPATH = 1<<10,
> IFF_L3MDEV_MASTER = 1<<18,
> IFF_L3MDEV_SLAVE = 1<<21,
> IFF_TEAM = 1<<22,
> IFF_TEAM_PORT = 1<<13,
> };
That's not in uapi, is it? the comment above that says:
These flags are invisible to userspace
>
> >
> >> In private
> >> flag. I don't see no reason to break this pattern here.
> >
> >Other masters are setup from userspace, this one is set up automatically
> >by kernel. So the bar is higher, we need an interface that existing
> >userspace knows about. We can't just say "oh if userspace set this up
> >it should know to skip lowerdevs".
> >
> >Otherwise multiple interfaces with same mac tend to confuse userspace.
>
> No difference, really.
> Regardless who does the setup, and independent userspace deamon should
> react accordingly.
If the deamon does the setup itself, it's reasonable to require that it
learns about new flags each time we add a new driver. If it doesn't,
then I think it's less reasonable.
--
MST
^ permalink raw reply
* Re: [RFC V4 PATCH 7/8] vhost: packed ring support
From: Wei Xu @ 2018-05-22 16:54 UTC (permalink / raw)
To: Jason Wang; +Cc: kvm, mst, netdev, linux-kernel, virtualization
In-Reply-To: <1526473941-16199-8-git-send-email-jasowang@redhat.com>
On Wed, May 16, 2018 at 08:32:20PM +0800, Jason Wang wrote:
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> drivers/vhost/net.c | 3 +-
> drivers/vhost/vhost.c | 539 ++++++++++++++++++++++++++++++++++++++++++++++----
> drivers/vhost/vhost.h | 8 +-
> 3 files changed, 513 insertions(+), 37 deletions(-)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 8304c30..f2a0f5b 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -1358,6 +1382,8 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
> break;
> }
> vq->last_avail_idx = s.num;
> + if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
> + vq->avail_wrap_counter = s.num >> 31;
> /* Forget the cached index value. */
> vq->avail_idx = vq->last_avail_idx;
> break;
> @@ -1366,6 +1392,8 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
> s.num = vq->last_avail_idx;
> if (copy_to_user(argp, &s, sizeof s))
> r = -EFAULT;
> + if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
> + s.num |= vq->avail_wrap_counter << 31;
> break;
> case VHOST_SET_VRING_ADDR:
> if (copy_from_user(&a, argp, sizeof a)) {
'last_used_idx' also needs to be saved/restored here.
I have figured out the root cause of broken device after reloading
'virtio-net' module, all indices have been reset for a reloading but
'last_used_idx' is not properly reset in this case. This confuses
handle_rx()/tx().
Wei
^ permalink raw reply
* Re: [PATCH net-next v11 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Jiri Pirko @ 2018-05-22 17:38 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: alexander.h.duyck, virtio-dev, kubakici, Sridhar Samudrala,
virtualization, loseweigh, netdev, anjali.singhai, aaron.f.brown,
davem
In-Reply-To: <20180522194633-mutt-send-email-mst@kernel.org>
Tue, May 22, 2018 at 06:52:21PM CEST, mst@redhat.com wrote:
>On Tue, May 22, 2018 at 05:45:01PM +0200, Jiri Pirko wrote:
>> Tue, May 22, 2018 at 05:32:30PM CEST, mst@redhat.com wrote:
>> >On Tue, May 22, 2018 at 05:13:43PM +0200, Jiri Pirko wrote:
>> >> Tue, May 22, 2018 at 03:39:33PM CEST, mst@redhat.com wrote:
>> >> >On Tue, May 22, 2018 at 03:26:26PM +0200, Jiri Pirko wrote:
>> >> >> Tue, May 22, 2018 at 03:17:37PM CEST, mst@redhat.com wrote:
>> >> >> >On Tue, May 22, 2018 at 03:14:22PM +0200, Jiri Pirko wrote:
>> >> >> >> Tue, May 22, 2018 at 03:12:40PM CEST, mst@redhat.com wrote:
>> >> >> >> >On Tue, May 22, 2018 at 11:08:53AM +0200, Jiri Pirko wrote:
>> >> >> >> >> Tue, May 22, 2018 at 11:06:37AM CEST, jiri@resnulli.us wrote:
>> >> >> >> >> >Tue, May 22, 2018 at 04:06:18AM CEST, sridhar.samudrala@intel.com wrote:
>> >> >> >> >> >>Use the registration/notification framework supported by the generic
>> >> >> >> >> >>failover infrastructure.
>> >> >> >> >> >>
>> >> >> >> >> >>Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>> >> >> >> >> >
>> >> >> >> >> >In previous patchset versions, the common code did
>> >> >> >> >> >netdev_rx_handler_register() and netdev_upper_dev_link() etc
>> >> >> >> >> >(netvsc_vf_join()). Now, this is still done in netvsc. Why?
>> >> >> >> >> >
>> >> >> >> >> >This should be part of the common "failover" code.
>> >> >> >> >> >
>> >> >> >> >>
>> >> >> >> >> Also note that in the current patchset you use IFF_FAILOVER flag for
>> >> >> >> >> master, yet for the slave you use IFF_SLAVE. That is wrong.
>> >> >> >> >> IFF_FAILOVER_SLAVE should be used.
>> >> >> >> >
>> >> >> >> >Or drop IFF_FAILOVER_SLAVE and set both IFF_FAILOVER and IFF_SLAVE?
>> >> >> >>
>> >> >> >> No. IFF_SLAVE is for bonding.
>> >> >> >
>> >> >> >What breaks if we reuse it for failover?
>> >> >>
>> >> >> This is exposed to userspace. IFF_SLAVE is expected for bonding slaves.
>> >> >> And failover slave is not a bonding slave.
>> >> >
>> >> >That does not really answer the question. I'd claim it's sufficiently
>> >> >like a bond slave for IFF_SLAVE to make sense.
>> >> >
>> >> >In fact you will find that netvsc already sets IFF_SLAVE, and so
>> >>
>> >> netvsc does the whole failover thing in a wrong way. This patchset is
>> >> trying to fix it.
>> >
>> >Maybe, but we don't need gratuitous changes either, especially if they
>> >break userspace.
>>
>> What do you mean by the "break"? It was a mistake to reuse IFF_SLAVE at
>> the first place, lets fix it. If some userspace depends on that flag, it
>> is broken anyway.
>>
>>
>> >
>> >> >does e.g. the eql driver.
>> >> >
>> >> >The advantage of using IFF_SLAVE is that userspace knows to skip it. If
>> >>
>> >> The userspace should know how to skip other types of slaves - team,
>> >> bridge, ovs, etc.
>> >> The "master link" should be the one to look at.
>> >>
>> >
>> >How should existing userspace know which ones to skip and which one is
>> >the master? Right now userspace seems to assume whatever does not have
>> >IFF_SLAVE should be looked at. Are you saying that's not the right thing
>>
>> Why do you say so? What do you mean by "looked at"? Certainly not.
>> IFLA_MASTER is the attribute that should be looked at, nothing else.
>>
>>
>> >to do and userspace should be fixed? What should userspace do in
>> >your opinion that will be forward compatible with future kernels?
>> >
>> >>
>> >> >we don't set IFF_SLAVE existing userspace tries to use the lowerdev.
>> >>
>> >> Each master type has a IFF_ master flag and IFF_ slave flag.
>> >
>> >Could you give some examples please?
>>
>> enum netdev_priv_flags {
>> IFF_EBRIDGE = 1<<1,
>> IFF_BRIDGE_PORT = 1<<9,
>> IFF_OPENVSWITCH = 1<<20,
>> IFF_OVS_DATAPATH = 1<<10,
>> IFF_L3MDEV_MASTER = 1<<18,
>> IFF_L3MDEV_SLAVE = 1<<21,
>> IFF_TEAM = 1<<22,
>> IFF_TEAM_PORT = 1<<13,
>> };
>
>That's not in uapi, is it? the comment above that says:
Correct.
>
>These flags are invisible to userspace
>
>
>
>>
>> >
>> >> In private
>> >> flag. I don't see no reason to break this pattern here.
>> >
>> >Other masters are setup from userspace, this one is set up automatically
>> >by kernel. So the bar is higher, we need an interface that existing
>> >userspace knows about. We can't just say "oh if userspace set this up
>> >it should know to skip lowerdevs".
>> >
>> >Otherwise multiple interfaces with same mac tend to confuse userspace.
>>
>> No difference, really.
>> Regardless who does the setup, and independent userspace deamon should
>> react accordingly.
>
>If the deamon does the setup itself, it's reasonable to require that it
>learns about new flags each time we add a new driver. If it doesn't,
>then I think it's less reasonable.
No need. The "IFLA_MASTER" attr is always there to be looked at. That is
enough.
^ permalink raw reply
* Re: [PATCH net-next v11 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Michael S. Tsirkin @ 2018-05-22 19:54 UTC (permalink / raw)
To: Jiri Pirko
Cc: alexander.h.duyck, virtio-dev, kubakici, Sridhar Samudrala,
virtualization, loseweigh, netdev, anjali.singhai, aaron.f.brown,
davem
In-Reply-To: <20180522173844.GP2149@nanopsycho>
On Tue, May 22, 2018 at 07:38:44PM +0200, Jiri Pirko wrote:
> >> >> In private
> >> >> flag. I don't see no reason to break this pattern here.
> >> >
> >> >Other masters are setup from userspace, this one is set up automatically
> >> >by kernel. So the bar is higher, we need an interface that existing
> >> >userspace knows about. We can't just say "oh if userspace set this up
> >> >it should know to skip lowerdevs".
> >> >
> >> >Otherwise multiple interfaces with same mac tend to confuse userspace.
> >>
> >> No difference, really.
> >> Regardless who does the setup, and independent userspace deamon should
> >> react accordingly.
> >
> >If the deamon does the setup itself, it's reasonable to require that it
> >learns about new flags each time we add a new driver. If it doesn't,
> >then I think it's less reasonable.
>
> No need. The "IFLA_MASTER" attr is always there to be looked at. That is
> enough.
Oh so if it has an master, skip it? Sorry, I misunderstood what you were
saying earlier.
Thanks, this makes sense to me.
--
MST
^ permalink raw reply
* Re: [PATCH net-next v11 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Samudrala, Sridhar @ 2018-05-22 20:54 UTC (permalink / raw)
To: Jiri Pirko, Michael S. Tsirkin
Cc: alexander.h.duyck, virtio-dev, kubakici, netdev, virtualization,
loseweigh, anjali.singhai, aaron.f.brown, davem
In-Reply-To: <20180522161246.GN2149@nanopsycho>
On 5/22/2018 9:12 AM, Jiri Pirko wrote:
> Fixing the subj, sorry about that.
>
> Tue, May 22, 2018 at 05:46:21PM CEST, mst@redhat.com wrote:
>> On Tue, May 22, 2018 at 05:36:14PM +0200, Jiri Pirko wrote:
>>> Tue, May 22, 2018 at 05:28:42PM CEST, sridhar.samudrala@intel.com wrote:
>>>> On 5/22/2018 2:08 AM, Jiri Pirko wrote:
>>>>> Tue, May 22, 2018 at 11:06:37AM CEST, jiri@resnulli.us wrote:
>>>>>> Tue, May 22, 2018 at 04:06:18AM CEST, sridhar.samudrala@intel.com wrote:
>>>>>>> Use the registration/notification framework supported by the generic
>>>>>>> failover infrastructure.
>>>>>>>
>>>>>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>>>>>> In previous patchset versions, the common code did
>>>>>> netdev_rx_handler_register() and netdev_upper_dev_link() etc
>>>>>> (netvsc_vf_join()). Now, this is still done in netvsc. Why?
>>>>>>
>>>>>> This should be part of the common "failover" code.
>>>> Based on Stephen's feedback on earlier patches, i tried to minimize the changes to
>>>> netvsc and only commonize the notifier and the main event handler routine.
>>>> Another complication is that netvsc does part of registration in a delayed workqueue.
>>> :( This kind of degrades the whole efford of having single solution
>>> in "failover" module. I think that common parts, as
>>> netdev_rx_handler_register() and others certainly is should be inside
>>> the common module. This is not a good time to minimize changes. Let's do
>>> the thing properly and fix the netvsc mess now.
>>>
>>>
>>>> It should be possible to move some of the code from net_failover.c to generic
>>>> failover.c in future if Stephen is ok with it.
>>>>
>>>>
>>>>> Also note that in the current patchset you use IFF_FAILOVER flag for
>>>>> master, yet for the slave you use IFF_SLAVE. That is wrong.
>>>>> IFF_FAILOVER_SLAVE should be used.
>>>> Not sure which code you are referring to. I only set IFF_FAILOVER_SLAVE
>>>> in patch 3.
>>> The existing netvsc driver.
>> We really can't change netvsc's flags now, even if it's interface is
>> messy, it's being used in the field. We can add a flag that makes netvsc
>> behave differently, and if this flag also allows enhanced functionality
>> userspace will gradually switch.
> Okay, although in this case, it really does not make much sense, so be
> it. Leave the netvsc set the ->priv flag to IFF_SLAVE as it is doing
> now. (This once-wrong-forever-wrong policy is flustrating me).
>
> But since this patchset introduces private flag IFF_FAILOVER and
> IFF_FAILOVER_SLAVE, and we set IFF_FAILOVER to the netvsc netdev
> instance, we should also set IFF_FAILOVER_SLAVE to the enslaved VF
> netdevice to get at least some consistency between virtio_net and
> netvsc.
OK. I can make this change to set/unset IFF_FAILOVER_SLAVE in the netvsc
register/unregister routines so that it is consistent with virtio_net.
Based on your discussion with mst, i think we can even remove IFF_SLAVE
setting on netvsc as it should not impact userspace. If Stephen is OK
we can make this change too.
Do you see any other items that need to be resolved for this series to go in
this merge window?
>
>> Anything breaking userspace I fully expect Stephen to nack and
>> IMO with good reason.
>>
>> --
>> MST
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [RFC V4 PATCH 7/8] vhost: packed ring support
From: Jason Wang @ 2018-05-23 1:39 UTC (permalink / raw)
To: Wei Xu; +Cc: kvm, mst, netdev, linux-kernel, virtualization
In-Reply-To: <20180522165448.GA13523@wei-ubt>
On 2018年05月23日 00:54, Wei Xu wrote:
> On Wed, May 16, 2018 at 08:32:20PM +0800, Jason Wang wrote:
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>> drivers/vhost/net.c | 3 +-
>> drivers/vhost/vhost.c | 539 ++++++++++++++++++++++++++++++++++++++++++++++----
>> drivers/vhost/vhost.h | 8 +-
>> 3 files changed, 513 insertions(+), 37 deletions(-)
>>
>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>> index 8304c30..f2a0f5b 100644
>> --- a/drivers/vhost/vhost.c
>> +++ b/drivers/vhost/vhost.c
>> @@ -1358,6 +1382,8 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
>> break;
>> }
>> vq->last_avail_idx = s.num;
>> + if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
>> + vq->avail_wrap_counter = s.num >> 31;
>> /* Forget the cached index value. */
>> vq->avail_idx = vq->last_avail_idx;
>> break;
>> @@ -1366,6 +1392,8 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
>> s.num = vq->last_avail_idx;
>> if (copy_to_user(argp, &s, sizeof s))
>> r = -EFAULT;
>> + if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
>> + s.num |= vq->avail_wrap_counter << 31;
>> break;
>> case VHOST_SET_VRING_ADDR:
>> if (copy_from_user(&a, argp, sizeof a)) {
> 'last_used_idx' also needs to be saved/restored here.
>
> I have figured out the root cause of broken device after reloading
> 'virtio-net' module, all indices have been reset for a reloading but
> 'last_used_idx' is not properly reset in this case. This confuses
> handle_rx()/tx().
>
> Wei
>
Good catch, so we probably need a new ioctl to sync between qemu and vhost.
Something like VHOST_SET/GET_USED_BASE.
Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net-next v11 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Jiri Pirko @ 2018-05-23 6:27 UTC (permalink / raw)
To: Samudrala, Sridhar
Cc: alexander.h.duyck, virtio-dev, Michael S. Tsirkin, kubakici,
netdev, virtualization, loseweigh, anjali.singhai, aaron.f.brown,
davem
In-Reply-To: <8f611f3b-88d7-4d41-fd47-d07f11d0f25a@intel.com>
Tue, May 22, 2018 at 10:54:29PM CEST, sridhar.samudrala@intel.com wrote:
>
>
>On 5/22/2018 9:12 AM, Jiri Pirko wrote:
>> Fixing the subj, sorry about that.
>>
>> Tue, May 22, 2018 at 05:46:21PM CEST, mst@redhat.com wrote:
>> > On Tue, May 22, 2018 at 05:36:14PM +0200, Jiri Pirko wrote:
>> > > Tue, May 22, 2018 at 05:28:42PM CEST, sridhar.samudrala@intel.com wrote:
>> > > > On 5/22/2018 2:08 AM, Jiri Pirko wrote:
>> > > > > Tue, May 22, 2018 at 11:06:37AM CEST, jiri@resnulli.us wrote:
>> > > > > > Tue, May 22, 2018 at 04:06:18AM CEST, sridhar.samudrala@intel.com wrote:
>> > > > > > > Use the registration/notification framework supported by the generic
>> > > > > > > failover infrastructure.
>> > > > > > >
>> > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>> > > > > > In previous patchset versions, the common code did
>> > > > > > netdev_rx_handler_register() and netdev_upper_dev_link() etc
>> > > > > > (netvsc_vf_join()). Now, this is still done in netvsc. Why?
>> > > > > >
>> > > > > > This should be part of the common "failover" code.
>> > > > Based on Stephen's feedback on earlier patches, i tried to minimize the changes to
>> > > > netvsc and only commonize the notifier and the main event handler routine.
>> > > > Another complication is that netvsc does part of registration in a delayed workqueue.
>> > > :( This kind of degrades the whole efford of having single solution
>> > > in "failover" module. I think that common parts, as
>> > > netdev_rx_handler_register() and others certainly is should be inside
>> > > the common module. This is not a good time to minimize changes. Let's do
>> > > the thing properly and fix the netvsc mess now.
>> > >
>> > >
>> > > > It should be possible to move some of the code from net_failover.c to generic
>> > > > failover.c in future if Stephen is ok with it.
>> > > >
>> > > >
>> > > > > Also note that in the current patchset you use IFF_FAILOVER flag for
>> > > > > master, yet for the slave you use IFF_SLAVE. That is wrong.
>> > > > > IFF_FAILOVER_SLAVE should be used.
>> > > > Not sure which code you are referring to. I only set IFF_FAILOVER_SLAVE
>> > > > in patch 3.
>> > > The existing netvsc driver.
>> > We really can't change netvsc's flags now, even if it's interface is
>> > messy, it's being used in the field. We can add a flag that makes netvsc
>> > behave differently, and if this flag also allows enhanced functionality
>> > userspace will gradually switch.
>> Okay, although in this case, it really does not make much sense, so be
>> it. Leave the netvsc set the ->priv flag to IFF_SLAVE as it is doing
>> now. (This once-wrong-forever-wrong policy is flustrating me).
>>
>> But since this patchset introduces private flag IFF_FAILOVER and
>> IFF_FAILOVER_SLAVE, and we set IFF_FAILOVER to the netvsc netdev
>> instance, we should also set IFF_FAILOVER_SLAVE to the enslaved VF
>> netdevice to get at least some consistency between virtio_net and
>> netvsc.
>
>OK. I can make this change to set/unset IFF_FAILOVER_SLAVE in the netvsc
>register/unregister routines so that it is consistent with virtio_net.
>
>Based on your discussion with mst, i think we can even remove IFF_SLAVE
>setting on netvsc as it should not impact userspace. If Stephen is OK
>we can make this change too.
>
>Do you see any other items that need to be resolved for this series to go in
>this merge window?
As I wrote previously, the common code including rx_handler registration
and setting of flags and master link should be done in a common code,
moved away from netvsc code.
Thanks.
>
>
>
>>
>> > Anything breaking userspace I fully expect Stephen to nack and
>> > IMO with good reason.
>> >
>> > --
>> > MST
>
^ permalink raw reply
* Re: [RFC V4 PATCH 7/8] vhost: packed ring support
From: Wei Xu @ 2018-05-23 7:17 UTC (permalink / raw)
To: Jason Wang; +Cc: kvm, mst, netdev, linux-kernel, virtualization
In-Reply-To: <e12d4055-6ae6-4d00-ae8b-1acd88633f48@redhat.com>
On Wed, May 23, 2018 at 09:39:28AM +0800, Jason Wang wrote:
>
>
> On 2018年05月23日 00:54, Wei Xu wrote:
> >On Wed, May 16, 2018 at 08:32:20PM +0800, Jason Wang wrote:
> >>Signed-off-by: Jason Wang <jasowang@redhat.com>
> >>---
> >> drivers/vhost/net.c | 3 +-
> >> drivers/vhost/vhost.c | 539 ++++++++++++++++++++++++++++++++++++++++++++++----
> >> drivers/vhost/vhost.h | 8 +-
> >> 3 files changed, 513 insertions(+), 37 deletions(-)
> >>
> >>diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> >>index 8304c30..f2a0f5b 100644
> >>--- a/drivers/vhost/vhost.c
> >>+++ b/drivers/vhost/vhost.c
> >>@@ -1358,6 +1382,8 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
> >> break;
> >> }
> >> vq->last_avail_idx = s.num;
> >>+ if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
> >>+ vq->avail_wrap_counter = s.num >> 31;
> >> /* Forget the cached index value. */
> >> vq->avail_idx = vq->last_avail_idx;
> >> break;
> >>@@ -1366,6 +1392,8 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
> >> s.num = vq->last_avail_idx;
> >> if (copy_to_user(argp, &s, sizeof s))
> >> r = -EFAULT;
> >>+ if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
> >>+ s.num |= vq->avail_wrap_counter << 31;
> >> break;
> >> case VHOST_SET_VRING_ADDR:
> >> if (copy_from_user(&a, argp, sizeof a)) {
> >'last_used_idx' also needs to be saved/restored here.
> >
> >I have figured out the root cause of broken device after reloading
> >'virtio-net' module, all indices have been reset for a reloading but
> >'last_used_idx' is not properly reset in this case. This confuses
> >handle_rx()/tx().
> >
> >Wei
> >
>
> Good catch, so we probably need a new ioctl to sync between qemu and vhost.
>
> Something like VHOST_SET/GET_USED_BASE.
Sure, or can we expand 'vhost_vring_state' to keep them done in a bunch?
>
> Thanks
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [RFC V4 PATCH 7/8] vhost: packed ring support
From: Jason Wang @ 2018-05-23 8:57 UTC (permalink / raw)
To: Wei Xu; +Cc: kvm, mst, netdev, linux-kernel, virtualization
In-Reply-To: <20180523071727.GA13373@wei-ubt>
On 2018年05月23日 15:17, Wei Xu wrote:
> On Wed, May 23, 2018 at 09:39:28AM +0800, Jason Wang wrote:
>>
>> On 2018年05月23日 00:54, Wei Xu wrote:
>>> On Wed, May 16, 2018 at 08:32:20PM +0800, Jason Wang wrote:
>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>> ---
>>>> drivers/vhost/net.c | 3 +-
>>>> drivers/vhost/vhost.c | 539 ++++++++++++++++++++++++++++++++++++++++++++++----
>>>> drivers/vhost/vhost.h | 8 +-
>>>> 3 files changed, 513 insertions(+), 37 deletions(-)
>>>>
>>>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>>>> index 8304c30..f2a0f5b 100644
>>>> --- a/drivers/vhost/vhost.c
>>>> +++ b/drivers/vhost/vhost.c
>>>> @@ -1358,6 +1382,8 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
>>>> break;
>>>> }
>>>> vq->last_avail_idx = s.num;
>>>> + if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
>>>> + vq->avail_wrap_counter = s.num >> 31;
>>>> /* Forget the cached index value. */
>>>> vq->avail_idx = vq->last_avail_idx;
>>>> break;
>>>> @@ -1366,6 +1392,8 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
>>>> s.num = vq->last_avail_idx;
>>>> if (copy_to_user(argp, &s, sizeof s))
>>>> r = -EFAULT;
>>>> + if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
>>>> + s.num |= vq->avail_wrap_counter << 31;
>>>> break;
>>>> case VHOST_SET_VRING_ADDR:
>>>> if (copy_from_user(&a, argp, sizeof a)) {
>>> 'last_used_idx' also needs to be saved/restored here.
>>>
>>> I have figured out the root cause of broken device after reloading
>>> 'virtio-net' module, all indices have been reset for a reloading but
>>> 'last_used_idx' is not properly reset in this case. This confuses
>>> handle_rx()/tx().
>>>
>>> Wei
>>>
>> Good catch, so we probably need a new ioctl to sync between qemu and vhost.
>>
>> Something like VHOST_SET/GET_USED_BASE.
> Sure, or can we expand 'vhost_vring_state' to keep them done in a bunch?
It's port of uapi, so we can't.
Thanks
>
>> Thanks
>>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net-next v11 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Samudrala, Sridhar @ 2018-05-23 16:16 UTC (permalink / raw)
To: Jiri Pirko
Cc: alexander.h.duyck, virtio-dev, Michael S. Tsirkin, kubakici,
netdev, virtualization, loseweigh, anjali.singhai, aaron.f.brown,
davem
In-Reply-To: <20180523062748.GA3155@nanopsycho>
On 5/22/2018 11:27 PM, Jiri Pirko wrote:
> Tue, May 22, 2018 at 10:54:29PM CEST, sridhar.samudrala@intel.com wrote:
>>
>> On 5/22/2018 9:12 AM, Jiri Pirko wrote:
>>> Fixing the subj, sorry about that.
>>>
>>> Tue, May 22, 2018 at 05:46:21PM CEST, mst@redhat.com wrote:
>>>> On Tue, May 22, 2018 at 05:36:14PM +0200, Jiri Pirko wrote:
>>>>> Tue, May 22, 2018 at 05:28:42PM CEST, sridhar.samudrala@intel.com wrote:
>>>>>> On 5/22/2018 2:08 AM, Jiri Pirko wrote:
>>>>>>> Tue, May 22, 2018 at 11:06:37AM CEST, jiri@resnulli.us wrote:
>>>>>>>> Tue, May 22, 2018 at 04:06:18AM CEST, sridhar.samudrala@intel.com wrote:
>>>>>>>>> Use the registration/notification framework supported by the generic
>>>>>>>>> failover infrastructure.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>>>>>>>> In previous patchset versions, the common code did
>>>>>>>> netdev_rx_handler_register() and netdev_upper_dev_link() etc
>>>>>>>> (netvsc_vf_join()). Now, this is still done in netvsc. Why?
>>>>>>>>
>>>>>>>> This should be part of the common "failover" code.
>>>>>> Based on Stephen's feedback on earlier patches, i tried to minimize the changes to
>>>>>> netvsc and only commonize the notifier and the main event handler routine.
>>>>>> Another complication is that netvsc does part of registration in a delayed workqueue.
>>>>> :( This kind of degrades the whole efford of having single solution
>>>>> in "failover" module. I think that common parts, as
>>>>> netdev_rx_handler_register() and others certainly is should be inside
>>>>> the common module. This is not a good time to minimize changes. Let's do
>>>>> the thing properly and fix the netvsc mess now.
>>>>>
>>>>>
>>>>>> It should be possible to move some of the code from net_failover.c to generic
>>>>>> failover.c in future if Stephen is ok with it.
>>>>>>
>>>>>>
>>>>>>> Also note that in the current patchset you use IFF_FAILOVER flag for
>>>>>>> master, yet for the slave you use IFF_SLAVE. That is wrong.
>>>>>>> IFF_FAILOVER_SLAVE should be used.
>>>>>> Not sure which code you are referring to. I only set IFF_FAILOVER_SLAVE
>>>>>> in patch 3.
>>>>> The existing netvsc driver.
>>>> We really can't change netvsc's flags now, even if it's interface is
>>>> messy, it's being used in the field. We can add a flag that makes netvsc
>>>> behave differently, and if this flag also allows enhanced functionality
>>>> userspace will gradually switch.
>>> Okay, although in this case, it really does not make much sense, so be
>>> it. Leave the netvsc set the ->priv flag to IFF_SLAVE as it is doing
>>> now. (This once-wrong-forever-wrong policy is flustrating me).
>>>
>>> But since this patchset introduces private flag IFF_FAILOVER and
>>> IFF_FAILOVER_SLAVE, and we set IFF_FAILOVER to the netvsc netdev
>>> instance, we should also set IFF_FAILOVER_SLAVE to the enslaved VF
>>> netdevice to get at least some consistency between virtio_net and
>>> netvsc.
>> OK. I can make this change to set/unset IFF_FAILOVER_SLAVE in the netvsc
>> register/unregister routines so that it is consistent with virtio_net.
>>
>> Based on your discussion with mst, i think we can even remove IFF_SLAVE
>> setting on netvsc as it should not impact userspace. If Stephen is OK
>> we can make this change too.
>>
>> Do you see any other items that need to be resolved for this series to go in
>> this merge window?
> As I wrote previously, the common code including rx_handler registration
> and setting of flags and master link should be done in a common code,
> moved away from netvsc code.
>
This requires re-introducing the 2 additional ops pre_register and pre_unregister
that i removed in the last couple of revisions to minimize netvsc changes and the
indirect calls that Stephen expressed some concern.
But, as these calls don't happen in hot path, i guess it should not be a big
issue and the right way to go.
Will submit a v12 with these updates.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net V2 0/4] Fix several issues of virtio-net mergeable XDP
From: David Miller @ 2018-05-23 17:37 UTC (permalink / raw)
To: jasowang; +Cc: netdev, virtualization, linux-kernel, mst
In-Reply-To: <1526960671-11782-1-git-send-email-jasowang@redhat.com>
From: Jason Wang <jasowang@redhat.com>
Date: Tue, 22 May 2018 11:44:27 +0800
> Please review the patches that tries to fix sevreal issues of
> virtio-net mergeable XDP.
>
> Changes from V1:
> - check against 1 before decreasing instead of resetting to 1
> - typoe fixes
Series applied and queued up for -stable.
^ permalink raw reply
* Re: [PATCH RFCv2 0/4] virtio-mem: paravirtualized memory
From: David Hildenbrand @ 2018-05-23 18:27 UTC (permalink / raw)
To: linux-mm
Cc: Michal Hocko, KVM, Michael S. Tsirkin, Heiko Carstens,
qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org,
Dan Williams, Andrea Arcangeli, virtio-dev@lists.oasis-open.org,
Pavel Tatashin, Halil Pasic, Len Brown, qemu-s390x,
Stefan Hajnoczi, Thomas Gleixner, Andrew Morton, Vlastimil Babka,
Greg Kroah-Hartman, Cornelia Huck, Rafael J. Wysocki,
linux-kernel
In-Reply-To: <20180523182404.11433-1-david@redhat.com>
On 23.05.2018 20:24, David Hildenbrand wrote:
> This is the Linux driver side of virtio-mem. Compared to the QEMU side,
> it is in a pretty complete and clean state.
>
> virtio-mem is a paravirtualized mechanism of adding/removing memory to/from
> a VM. We can do this on a 4MB granularity right now. In Linux, all
> memory is added to the ZONE_NORMAL, so unplugging cannot be guaranteed -
> but will be more likely to succeed compared to unplugging 128MB+ chunks.
> We might implement some optimizations in that area in the future that will
> make memory unplug more reliable.
>
> For now, this is an easy way to give a VM access to more memory and
> eventually to remove some memory again. I am testing it on x86 and
> s390x (under QEMU TCG so far only).
>
> This is the follow up on [1], but the concept, user interface and
> virtio protocol has been heavily changed. I am only including the important
> parts in this cover letter (because otherwise nobody will read it). Please
> feel free to ask in case there are any questions.
>
> This series is based on [4] and shows how it is being used. It contains
> further information. Also have a look at the description of patch nr 4 in
> this series.
>
> This work is the result of the initital idea of Andrea Arcangeli to host
> enforce guest access to memory inflated in virtio-balloon using
> userfaultfd, which turned out to be problematic to implement. That's how
> I came up with virtio-mem.
>
> --------------------------------------------------------------------------
> 1. High level concept
> --------------------------------------------------------------------------
>
> Each virtio-mem device owns a memory region in the physical address space.
> The guest is allowed to plug and online up to 'requested_size' of memory.
> It will not be allowed to plug more than that size. Unplugged memory will
> be protected by configurable mechanisms (e.g. random discard, userfaultfd
> protection, etc.). virtio-mem is designed in a way that a guest may never
> assume to be able to even read unplugged memory. This is a big difference
> to classical balloon drivers.
>
> The usable memory region might grow over time, so not all parts of the
> device memory region might be usable from the start. This is an
> optimization to allow a smarter implementation in the hypervisor (reduce
> size of dirty bitmaps, size of memory regions ...).
>
> When the device driver starts up, it will query 'requested_size' and start
> to add memory to the system. This memory is not indicated e.g. via ACPI,
> so unmodified systems will not silently try to use unplugged memory that
> they are not supposed to touch.
>
> Updates on the 'requested_size' indicate hypervisor requests to plug or
> unplug memory.
>
> As each virtio-mem device can belong to a NUMA node, we can easily
> plug/unplug memory on a NUMA basis. And of course, we can have several
> independent virtio-mem devices for a VM.
>
> The idea is *not* to add new virtio-mem devices when hotplugging memory,
> the idea is to resize (grow/shrink) virtio-mem devices.
>
> --------------------------------------------------------------------------
> 2. Benefits
> --------------------------------------------------------------------------
>
> Guest side:
> - Increase memory usable by Linux in 4MB steps (vs. section size like 128MB
> on x86 or 2GB on e.g. some arm if I'm not mistaking)
> - Remove struct pages once all 4MB chunks of a section are offline (in
> contrast to all balloon drivers where this never happens)
> - Don't fragment memory, while still being able to unplug smaller chunks
> than ordinary DIMM sizes.
> - Memory hotplug support for architectures that have no proper interface
> (e.g. s390x misses the external notification part) or e.g. QEMU/Linux
> support is complicated to implement.
> - Automatic management of onlining/offlining in the device driver -
> no manual interaction from an admin/tool necessary.
>
> QEMU side:
> - Resizing (plug/unplug) has a single interface - in contrast to a mixture
> of ACPI and virtio-balloon. See the example below.
> - Migration works out of the box - no need to specify new DIMMs or new
> sizes on the migration target. It simply works.
> - We can resize in arbitrary steps and sizes (in contrast to e.g. ACPI,
> where we have to know upfront in which granularity we later on want to
> remove memory or even how much memory we eventually want to add to our
> guest)
> - One interface to rule them (architectures) all :)
>
> --------------------------------------------------------------------------
> 3. Reboot handling
> --------------------------------------------------------------------------
>
> After a reboot, all memory is unplugged. This allows the hypervisor
> to see if support for virtio-mem is available in the freshly booted system.
> This way we could charge only for the actually "plugged" memory size. And
> it avoids to sense for plugged memory in the guest.
>
> E.g. on every size change of a virtio-mem device, we can notify management
> layers. So we can track how much memory a VM has plugged.
>
> --------------------------------------------------------------------------
> 4. Example
> --------------------------------------------------------------------------
>
> (not including resizable memory regions on the QEMU side yet, so don't
> focus on that part - it will consume a lot of memory right now for e.g.
> dirty bitmaps and memory slot tracking data)
>
> Start QEMU with two virtio-mem devices that provide little memory inititally.
> $ qemu-system-x86_64 -m 4G,maxmem=504G \
> -smp sockets=2,cores=2 \
> [...]
> -object memory-backend-ram,id=mem0,size=256G \
> -device virtio-mem-pci,id=vm0,memdev=mem0,node=0,size=4160M \
> -object memory-backend-ram,id=mem1,size=256G \
> -device virtio-mem-pci,id=vm1,memdev=mem1,node=1,size=3G
>
> Query the configuration ('size' tells us the guest driver is active):
> (qemu) info memory-devices
> info memory-devices
> Memory device [virtio-mem]: "vm0"
> phys-addr: 0x140000000
> node: 0
> requested-size: 4362076160
> size: 4362076160
> max-size: 274877906944
> block-size: 4194304
> memdev: /objects/mem0
> Memory device [virtio-mem]: "vm1"
> phys-addr: 0x4140000000
> node: 1
> requested-size: 3221225472
> size: 3221225472
> max-size: 274877906944
> block-size: 4194304
> memdev: /objects/mem1
>
> Change the size of a virtio-mem device:
> (qemu) memory-device-resize vm0 40960
> memory-device-resize vm0 40960
> ...
> (qemu) info memory-devices
> info memory-devices
> Memory device [virtio-mem]: "vm0"
> phys-addr: 0x140000000
> node: 0
> requested-size: 42949672960
> size: 42949672960
> max-size: 274877906944
> block-size: 4194304
> memdev: /objects/mem0
> ...
>
> Try to unplug memory (KASAN active in the guest - a lot of memory wasted):
> (qemu) memory-device-resize vm0 1024
> memory-device-resize vm0 1024
> ...
> (qemu) info memory-devices
> info memory-devices
> Memory device [virtio-mem]: "vm0"
> phys-addr: 0x140000000
> node: 0
> requested-size: 1073741824
> size: 6169821184
> max-size: 274877906944
> block-size: 4194304
> memdev: /objects/mem0
> ...
>
> I am sharing for now only the linux driver side. The current code can be
> found at [2]. The QEMU side is still heavily WIP, the current QEMU
> prototype can be found at [3].
>
>
> [1] https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg03870.html
> [2] https://github.com/davidhildenbrand/linux/tree/virtio-mem
> [3] https://github.com/davidhildenbrand/qemu/tree/virtio-mem
> [4] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1698014.html
>
> David Hildenbrand (4):
> ACPI: NUMA: export pxm_to_node
> s390: mm: support removal of memory
> s390: numa: implement memory_add_physaddr_to_nid()
> virtio-mem: paravirtualized memory
>
> arch/s390/mm/init.c | 18 +-
> arch/s390/numa/numa.c | 12 +
> drivers/acpi/numa.c | 1 +
> drivers/virtio/Kconfig | 15 +
> drivers/virtio/Makefile | 1 +
> drivers/virtio/virtio_mem.c | 1040 +++++++++++++++++++++++++++++++
> include/uapi/linux/virtio_ids.h | 1 +
> include/uapi/linux/virtio_mem.h | 134 ++++
> 8 files changed, 1216 insertions(+), 6 deletions(-)
> create mode 100644 drivers/virtio/virtio_mem.c
> create mode 100644 include/uapi/linux/virtio_mem.h
>
cc-ing some further mailing lists
--
Thanks,
David / dhildenb
^ permalink raw reply
* Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices
From: Michael S. Tsirkin @ 2018-05-23 18:50 UTC (permalink / raw)
To: Anshuman Khandual
Cc: robh, benh, linux-kernel, virtualization, hch, mpe, joe,
linuxppc-dev, elfring, david
In-Reply-To: <20180522063317.20956-1-khandual@linux.vnet.ibm.com>
subj: s/virito/virtio/
On Tue, May 22, 2018 at 12:03:17PM +0530, Anshuman Khandual wrote:
> This adds a hook which a platform can define in order to allow it to
> force the use of the DMA API for all virtio devices even if they don't
> have the VIRTIO_F_IOMMU_PLATFORM flag set. We want to use this to do
> bounce-buffering of data on the new secure pSeries platform, currently
> under development, where a KVM host cannot access all of the memory
> space of a secure KVM guest. The host can only access the pages which
> the guest has explicitly requested to be shared with the host, thus
> the virtio implementation in the guest has to copy data to and from
> shared pages.
>
> With this hook, the platform code in the secure guest can force the
> use of swiotlb for virtio buffers, with a back-end for swiotlb which
> will use a pool of pre-allocated shared pages. Thus all data being
> sent or received by virtio devices will be copied through pages which
> the host has access to.
>
> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> ---
> Changes in V2:
>
> The arch callback has been enabled through an weak symbol defintion
> so that it is enabled only for those architectures subscribing to
> this new framework. Clarified the patch description. The primary
> objective for this RFC has been to get an in principle agreement
> on this approach.
>
> Original V1:
>
> Original RFC and discussions https://patchwork.kernel.org/patch/10324405/
I re-read that discussion and I'm still unclear on the
original question, since I got several apparently
conflicting answers.
I asked:
Why isn't setting VIRTIO_F_IOMMU_PLATFORM on the
hypervisor side sufficient?
> arch/powerpc/include/asm/dma-mapping.h | 6 ++++++
> arch/powerpc/platforms/pseries/iommu.c | 11 +++++++++++
> drivers/virtio/virtio_ring.c | 10 ++++++++++
> 3 files changed, 27 insertions(+)
>
> diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
> index 8fa3945..056e578 100644
> --- a/arch/powerpc/include/asm/dma-mapping.h
> +++ b/arch/powerpc/include/asm/dma-mapping.h
> @@ -115,4 +115,10 @@ extern u64 __dma_get_required_mask(struct device *dev);
> #define ARCH_HAS_DMA_MMAP_COHERENT
>
> #endif /* __KERNEL__ */
> +
> +#define platform_forces_virtio_dma platform_forces_virtio_dma
> +
> +struct virtio_device;
> +
> +extern bool platform_forces_virtio_dma(struct virtio_device *vdev);
> #endif /* _ASM_DMA_MAPPING_H */
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index 06f0296..a2ec15a 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -38,6 +38,7 @@
> #include <linux/of.h>
> #include <linux/iommu.h>
> #include <linux/rculist.h>
> +#include <linux/virtio.h>
> #include <asm/io.h>
> #include <asm/prom.h>
> #include <asm/rtas.h>
> @@ -1396,3 +1397,13 @@ static int __init disable_multitce(char *str)
> __setup("multitce=", disable_multitce);
>
> machine_subsys_initcall_sync(pseries, tce_iommu_bus_notifier_init);
> +
> +bool platform_forces_virtio_dma(struct virtio_device *vdev)
> +{
> + /*
> + * On protected guest platforms, force virtio core to use DMA
> + * MAP API for all virtio devices. But there can also be some
> + * exceptions for individual devices like virtio balloon.
> + */
> + return (of_find_compatible_node(NULL, NULL, "ibm,ultravisor") != NULL);
> +}
Isn't this kind of slow? vring_use_dma_api is on
data path and supposed to be very fast.
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 21d464a..47ea6c3 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -141,8 +141,18 @@ struct vring_virtqueue {
> * unconditionally on data path.
> */
>
> +#ifndef platform_forces_virtio_dma
> +static inline bool platform_forces_virtio_dma(struct virtio_device *vdev)
> +{
> + return false;
> +}
> +#endif
> +
> static bool vring_use_dma_api(struct virtio_device *vdev)
> {
> + if (platform_forces_virtio_dma(vdev))
> + return true;
> +
> if (!virtio_has_iommu_quirk(vdev))
> return true;
>
> --
> 2.9.3
^ permalink raw reply
* [PATCH v3 00/27] x86: PIE support and option to extend KASLR randomization
From: Thomas Garnier via Virtualization @ 2018-05-23 19:53 UTC (permalink / raw)
To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf,
Greg Kroah-Hartman, Thomas Garnier, Philippe Ombredanne,
Kate Stewart, Arnaldo Carvalho de Melo, Yonghong Song,
Andrey Ryabinin, Kees Cook, Tom Lendacky, Kirill A . Shutemov,
Andy Lutomirski, Dominik Brodowski, Borislav Petkov,
Borislav Petkov, Rafael J . Wysocki
Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
virtualization, linux-sparse, linux-crypto, kernel-hardening,
xen-devel
Changes:
- patch v3:
- Update on message to describe longer term PIE goal.
- Minor change on ftrace if condition.
- Changed code using xchgq.
- patch v2:
- Adapt patch to work post KPTI and compiler changes
- Redo all performance testing with latest configs and compilers
- Simplify mov macro on PIE (MOVABS now)
- Reduce GOT footprint
- patch v1:
- Simplify ftrace implementation.
- Use gcc mstack-protector-guard-reg=%gs with PIE when possible.
- rfc v3:
- Use --emit-relocs instead of -pie to reduce dynamic relocation space on
mapped memory. It also simplifies the relocation process.
- Move the start the module section next to the kernel. Remove the need for
-mcmodel=large on modules. Extends module space from 1 to 2G maximum.
- Support for XEN PVH as 32-bit relocations can be ignored with
--emit-relocs.
- Support for GOT relocations previously done automatically with -pie.
- Remove need for dynamic PLT in modules.
- Support dymamic GOT for modules.
- rfc v2:
- Add support for global stack cookie while compiler default to fs without
mcmodel=kernel
- Change patch 7 to correctly jump out of the identity mapping on kexec load
preserve.
These patches make the changes necessary to build the kernel as Position
Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
the top 2G of the virtual address space. It allows to optionally extend the
KASLR randomization range from 1G to 3G. The chosen range is the one currently
available, future changes will allow the kernel module to have a wider
randomization range.
Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler
changes, PIE support and KASLR in general. Thanks to Roland McGrath on his
feedback for using -pie versus --emit-relocs and details on compiler code
generation.
The patches:
- 1-3, 5-13, 18-19: Change in assembly code to be PIE compliant.
- 4: Add a new _ASM_MOVABS macro to fetch a symbol address generically.
- 14: Adapt percpu design to work correctly when PIE is enabled.
- 15: Provide an option to default visibility to hidden except for key symbols.
It removes errors between compilation units.
- 16: Add PROVIDE_HIDDEN replacement on the linker script for weak symbols to
reduce GOT footprint.
- 17: Adapt relocation tool to handle PIE binary correctly.
- 20: Add support for global cookie.
- 21: Support ftrace with PIE (used on Ubuntu config).
- 22: Add option to move the module section just after the kernel.
- 23: Adapt module loading to support PIE with dynamic GOT.
- 24: Make the GOT read-only.
- 25: Add the CONFIG_X86_PIE option (off by default).
- 26: Adapt relocation tool to generate a 64-bit relocation table.
- 27: Add the CONFIG_RANDOMIZE_BASE_LARGE option to increase relocation range
from 1G to 3G (off by default).
Performance/Size impact:
Size of vmlinux (Default configuration):
File size:
- PIE disabled: +0.18%
- PIE enabled: -1.977% (less relocations)
.text section:
- PIE disabled: same
- PIE enabled: same
Size of vmlinux (Ubuntu configuration):
File size:
- PIE disabled: +0.21%
- PIE enabled: +10%
.text section:
- PIE disabled: same
- PIE enabled: +0.001%
The size increase is mainly due to not having access to the 32-bit signed
relocation that can be used with mcmodel=kernel. A small part is due to reduced
optimization for PIE code. This bug [1] was opened with gcc to provide a better
code generation for kernel PIE.
Hackbench (50% and 1600% on thread/process for pipe/sockets):
- PIE disabled: no significant change (avg -/+ 0.5% on latest test).
- PIE enabled: between -1% to +1% in average (default and Ubuntu config).
Kernbench (average of 10 Half and Optimal runs):
Elapsed Time:
- PIE disabled: no significant change (avg -0.5%)
- PIE enabled: average -0.5% to +0.5%
System Time:
- PIE disabled: no significant change (avg -0.1%)
- PIE enabled: average -0.4% to +0.4%.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303
diffstat:
Documentation/x86/x86_64/mm.txt | 3
arch/x86/Kconfig | 45 ++++++
arch/x86/Makefile | 58 ++++++++
arch/x86/boot/boot.h | 2
arch/x86/boot/compressed/Makefile | 5
arch/x86/boot/compressed/misc.c | 10 +
arch/x86/crypto/aes-x86_64-asm_64.S | 45 ++++--
arch/x86/crypto/aesni-intel_asm.S | 8 -
arch/x86/crypto/aesni-intel_avx-x86_64.S | 6
arch/x86/crypto/camellia-aesni-avx-asm_64.S | 42 +++---
arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 44 +++---
arch/x86/crypto/camellia-x86_64-asm_64.S | 8 -
arch/x86/crypto/cast5-avx-x86_64-asm_64.S | 50 ++++---
arch/x86/crypto/cast6-avx-x86_64-asm_64.S | 44 +++---
arch/x86/crypto/des3_ede-asm_64.S | 96 +++++++++-----
arch/x86/crypto/ghash-clmulni-intel_asm.S | 4
arch/x86/crypto/glue_helper-asm-avx.S | 4
arch/x86/crypto/glue_helper-asm-avx2.S | 6
arch/x86/crypto/sha256-avx2-asm.S | 23 ++-
arch/x86/entry/calling.h | 2
arch/x86/entry/entry_32.S | 3
arch/x86/entry/entry_64.S | 25 ++-
arch/x86/include/asm/asm.h | 1
arch/x86/include/asm/bug.h | 2
arch/x86/include/asm/ftrace.h | 6
arch/x86/include/asm/jump_label.h | 8 -
arch/x86/include/asm/kvm_host.h | 8 -
arch/x86/include/asm/module.h | 11 +
arch/x86/include/asm/page_64_types.h | 9 +
arch/x86/include/asm/paravirt_types.h | 12 +
arch/x86/include/asm/percpu.h | 25 ++-
arch/x86/include/asm/pgtable_64_types.h | 6
arch/x86/include/asm/pm-trace.h | 2
arch/x86/include/asm/processor.h | 16 +-
arch/x86/include/asm/sections.h | 8 +
arch/x86/include/asm/setup.h | 2
arch/x86/include/asm/stackprotector.h | 19 ++
arch/x86/kernel/Makefile | 6
arch/x86/kernel/acpi/wakeup_64.S | 31 ++--
arch/x86/kernel/asm-offsets.c | 3
arch/x86/kernel/asm-offsets_32.c | 3
arch/x86/kernel/asm-offsets_64.c | 3
arch/x86/kernel/cpu/common.c | 3
arch/x86/kernel/cpu/microcode/core.c | 4
arch/x86/kernel/ftrace.c | 42 +++++-
arch/x86/kernel/head64.c | 23 ++-
arch/x86/kernel/head_32.S | 3
arch/x86/kernel/head_64.S | 41 +++++-
arch/x86/kernel/kvm.c | 6
arch/x86/kernel/module.c | 181 ++++++++++++++++++++++++++-
arch/x86/kernel/module.lds | 3
arch/x86/kernel/process.c | 5
arch/x86/kernel/relocate_kernel_64.S | 16 +-
arch/x86/kernel/setup_percpu.c | 5
arch/x86/kernel/vmlinux.lds.S | 13 +
arch/x86/kvm/svm.c | 4
arch/x86/lib/cmpxchg16b_emu.S | 8 -
arch/x86/mm/dump_pagetables.c | 3
arch/x86/power/hibernate_asm_64.S | 4
arch/x86/tools/relocs.c | 169 +++++++++++++++++++++++--
arch/x86/tools/relocs.h | 4
arch/x86/tools/relocs_common.c | 15 +-
arch/x86/xen/xen-asm.S | 12 -
arch/x86/xen/xen-head.S | 11 -
arch/x86/xen/xen-pvh.S | 13 +
drivers/base/firmware_loader/main.c | 4
include/asm-generic/sections.h | 6
include/asm-generic/vmlinux.lds.h | 12 +
include/linux/compiler.h | 7 +
init/Kconfig | 16 ++
kernel/kallsyms.c | 16 +-
kernel/trace/trace.h | 4
lib/dynamic_debug.c | 4
scripts/link-vmlinux.sh | 14 ++
74 files changed, 1070 insertions(+), 315 deletions(-)
^ permalink raw reply
* [PATCH v3 01/27] x86/crypto: Adapt assembly for PIE support
From: Thomas Garnier via Virtualization @ 2018-05-23 19:53 UTC (permalink / raw)
To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf,
Greg Kroah-Hartman, Thomas Garnier, Philippe Ombredanne,
Kate Stewart, Arnaldo Carvalho de Melo, Yonghong Song,
Andrey Ryabinin, Kees Cook, Tom Lendacky, Kirill A . Shutemov,
Andy Lutomirski, Dominik Brodowski, Borislav Petkov,
Borislav Petkov, Rafael J . Wysocki
Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
virtualization, linux-sparse, linux-crypto, kernel-hardening,
xen-devel
In-Reply-To: <20180523195421.180248-1-thgarnie@google.com>
Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.
Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.
Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
arch/x86/crypto/aes-x86_64-asm_64.S | 45 +++++----
arch/x86/crypto/aesni-intel_asm.S | 8 +-
arch/x86/crypto/aesni-intel_avx-x86_64.S | 6 +-
arch/x86/crypto/camellia-aesni-avx-asm_64.S | 42 ++++-----
arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 44 ++++-----
arch/x86/crypto/camellia-x86_64-asm_64.S | 8 +-
arch/x86/crypto/cast5-avx-x86_64-asm_64.S | 50 +++++-----
arch/x86/crypto/cast6-avx-x86_64-asm_64.S | 44 +++++----
arch/x86/crypto/des3_ede-asm_64.S | 96 +++++++++++++-------
arch/x86/crypto/ghash-clmulni-intel_asm.S | 4 +-
arch/x86/crypto/glue_helper-asm-avx.S | 4 +-
arch/x86/crypto/glue_helper-asm-avx2.S | 6 +-
arch/x86/crypto/sha256-avx2-asm.S | 23 +++--
13 files changed, 221 insertions(+), 159 deletions(-)
diff --git a/arch/x86/crypto/aes-x86_64-asm_64.S b/arch/x86/crypto/aes-x86_64-asm_64.S
index 8739cf7795de..86fa068e5e81 100644
--- a/arch/x86/crypto/aes-x86_64-asm_64.S
+++ b/arch/x86/crypto/aes-x86_64-asm_64.S
@@ -48,8 +48,12 @@
#define R10 %r10
#define R11 %r11
+/* Hold global for PIE suport */
+#define RBASE %r12
+
#define prologue(FUNC,KEY,B128,B192,r1,r2,r5,r6,r7,r8,r9,r10,r11) \
ENTRY(FUNC); \
+ pushq RBASE; \
movq r1,r2; \
leaq KEY+48(r8),r9; \
movq r10,r11; \
@@ -74,54 +78,63 @@
movl r6 ## E,4(r9); \
movl r7 ## E,8(r9); \
movl r8 ## E,12(r9); \
+ popq RBASE; \
ret; \
ENDPROC(FUNC);
+#define round_mov(tab_off, reg_i, reg_o) \
+ leaq tab_off(%rip), RBASE; \
+ movl (RBASE,reg_i,4), reg_o;
+
+#define round_xor(tab_off, reg_i, reg_o) \
+ leaq tab_off(%rip), RBASE; \
+ xorl (RBASE,reg_i,4), reg_o;
+
#define round(TAB,OFFSET,r1,r2,r3,r4,r5,r6,r7,r8,ra,rb,rc,rd) \
movzbl r2 ## H,r5 ## E; \
movzbl r2 ## L,r6 ## E; \
- movl TAB+1024(,r5,4),r5 ## E;\
+ round_mov(TAB+1024, r5, r5 ## E)\
movw r4 ## X,r2 ## X; \
- movl TAB(,r6,4),r6 ## E; \
+ round_mov(TAB, r6, r6 ## E) \
roll $16,r2 ## E; \
shrl $16,r4 ## E; \
movzbl r4 ## L,r7 ## E; \
movzbl r4 ## H,r4 ## E; \
xorl OFFSET(r8),ra ## E; \
xorl OFFSET+4(r8),rb ## E; \
- xorl TAB+3072(,r4,4),r5 ## E;\
- xorl TAB+2048(,r7,4),r6 ## E;\
+ round_xor(TAB+3072, r4, r5 ## E)\
+ round_xor(TAB+2048, r7, r6 ## E)\
movzbl r1 ## L,r7 ## E; \
movzbl r1 ## H,r4 ## E; \
- movl TAB+1024(,r4,4),r4 ## E;\
+ round_mov(TAB+1024, r4, r4 ## E)\
movw r3 ## X,r1 ## X; \
roll $16,r1 ## E; \
shrl $16,r3 ## E; \
- xorl TAB(,r7,4),r5 ## E; \
+ round_xor(TAB, r7, r5 ## E) \
movzbl r3 ## L,r7 ## E; \
movzbl r3 ## H,r3 ## E; \
- xorl TAB+3072(,r3,4),r4 ## E;\
- xorl TAB+2048(,r7,4),r5 ## E;\
+ round_xor(TAB+3072, r3, r4 ## E)\
+ round_xor(TAB+2048, r7, r5 ## E)\
movzbl r1 ## L,r7 ## E; \
movzbl r1 ## H,r3 ## E; \
shrl $16,r1 ## E; \
- xorl TAB+3072(,r3,4),r6 ## E;\
- movl TAB+2048(,r7,4),r3 ## E;\
+ round_xor(TAB+3072, r3, r6 ## E)\
+ round_mov(TAB+2048, r7, r3 ## E)\
movzbl r1 ## L,r7 ## E; \
movzbl r1 ## H,r1 ## E; \
- xorl TAB+1024(,r1,4),r6 ## E;\
- xorl TAB(,r7,4),r3 ## E; \
+ round_xor(TAB+1024, r1, r6 ## E)\
+ round_xor(TAB, r7, r3 ## E) \
movzbl r2 ## H,r1 ## E; \
movzbl r2 ## L,r7 ## E; \
shrl $16,r2 ## E; \
- xorl TAB+3072(,r1,4),r3 ## E;\
- xorl TAB+2048(,r7,4),r4 ## E;\
+ round_xor(TAB+3072, r1, r3 ## E)\
+ round_xor(TAB+2048, r7, r4 ## E)\
movzbl r2 ## H,r1 ## E; \
movzbl r2 ## L,r2 ## E; \
xorl OFFSET+8(r8),rc ## E; \
xorl OFFSET+12(r8),rd ## E; \
- xorl TAB+1024(,r1,4),r3 ## E;\
- xorl TAB(,r2,4),r4 ## E;
+ round_xor(TAB+1024, r1, r3 ## E)\
+ round_xor(TAB, r2, r4 ## E)
#define move_regs(r1,r2,r3,r4) \
movl r3 ## E,r1 ## E; \
diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index e762ef417562..4df029aa5fc1 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -2610,7 +2610,7 @@ ENDPROC(aesni_cbc_dec)
*/
.align 4
_aesni_inc_init:
- movaps .Lbswap_mask, BSWAP_MASK
+ movaps .Lbswap_mask(%rip), BSWAP_MASK
movaps IV, CTR
PSHUFB_XMM BSWAP_MASK CTR
mov $1, TCTR_LOW
@@ -2738,12 +2738,12 @@ ENTRY(aesni_xts_crypt8)
cmpb $0, %cl
movl $0, %ecx
movl $240, %r10d
- leaq _aesni_enc4, %r11
- leaq _aesni_dec4, %rax
+ leaq _aesni_enc4(%rip), %r11
+ leaq _aesni_dec4(%rip), %rax
cmovel %r10d, %ecx
cmoveq %rax, %r11
- movdqa .Lgf128mul_x_ble_mask, GF128MUL_MASK
+ movdqa .Lgf128mul_x_ble_mask(%rip), GF128MUL_MASK
movups (IVP), IV
mov 480(KEYP), KLEN
diff --git a/arch/x86/crypto/aesni-intel_avx-x86_64.S b/arch/x86/crypto/aesni-intel_avx-x86_64.S
index faecb1518bf8..488605b19fe8 100644
--- a/arch/x86/crypto/aesni-intel_avx-x86_64.S
+++ b/arch/x86/crypto/aesni-intel_avx-x86_64.S
@@ -454,7 +454,8 @@ _get_AAD_rest0\@:
vpshufb and an array of shuffle masks */
movq %r12, %r11
salq $4, %r11
- movdqu aad_shift_arr(%r11), \T1
+ leaq aad_shift_arr(%rip), %rax
+ movdqu (%rax,%r11,), \T1
vpshufb \T1, reg_i, reg_i
_get_AAD_rest_final\@:
vpshufb SHUF_MASK(%rip), reg_i, reg_i
@@ -1761,7 +1762,8 @@ _get_AAD_rest0\@:
vpshufb and an array of shuffle masks */
movq %r12, %r11
salq $4, %r11
- movdqu aad_shift_arr(%r11), \T1
+ leaq aad_shift_arr(%rip), %rax
+ movdqu (%rax,%r11,), \T1
vpshufb \T1, reg_i, reg_i
_get_AAD_rest_final\@:
vpshufb SHUF_MASK(%rip), reg_i, reg_i
diff --git a/arch/x86/crypto/camellia-aesni-avx-asm_64.S b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
index a14af6eb09cb..f94ec9a5552b 100644
--- a/arch/x86/crypto/camellia-aesni-avx-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
@@ -53,10 +53,10 @@
/* \
* S-function with AES subbytes \
*/ \
- vmovdqa .Linv_shift_row, t4; \
- vbroadcastss .L0f0f0f0f, t7; \
- vmovdqa .Lpre_tf_lo_s1, t0; \
- vmovdqa .Lpre_tf_hi_s1, t1; \
+ vmovdqa .Linv_shift_row(%rip), t4; \
+ vbroadcastss .L0f0f0f0f(%rip), t7; \
+ vmovdqa .Lpre_tf_lo_s1(%rip), t0; \
+ vmovdqa .Lpre_tf_hi_s1(%rip), t1; \
\
/* AES inverse shift rows */ \
vpshufb t4, x0, x0; \
@@ -69,8 +69,8 @@
vpshufb t4, x6, x6; \
\
/* prefilter sboxes 1, 2 and 3 */ \
- vmovdqa .Lpre_tf_lo_s4, t2; \
- vmovdqa .Lpre_tf_hi_s4, t3; \
+ vmovdqa .Lpre_tf_lo_s4(%rip), t2; \
+ vmovdqa .Lpre_tf_hi_s4(%rip), t3; \
filter_8bit(x0, t0, t1, t7, t6); \
filter_8bit(x7, t0, t1, t7, t6); \
filter_8bit(x1, t0, t1, t7, t6); \
@@ -84,8 +84,8 @@
filter_8bit(x6, t2, t3, t7, t6); \
\
/* AES subbytes + AES shift rows */ \
- vmovdqa .Lpost_tf_lo_s1, t0; \
- vmovdqa .Lpost_tf_hi_s1, t1; \
+ vmovdqa .Lpost_tf_lo_s1(%rip), t0; \
+ vmovdqa .Lpost_tf_hi_s1(%rip), t1; \
vaesenclast t4, x0, x0; \
vaesenclast t4, x7, x7; \
vaesenclast t4, x1, x1; \
@@ -96,16 +96,16 @@
vaesenclast t4, x6, x6; \
\
/* postfilter sboxes 1 and 4 */ \
- vmovdqa .Lpost_tf_lo_s3, t2; \
- vmovdqa .Lpost_tf_hi_s3, t3; \
+ vmovdqa .Lpost_tf_lo_s3(%rip), t2; \
+ vmovdqa .Lpost_tf_hi_s3(%rip), t3; \
filter_8bit(x0, t0, t1, t7, t6); \
filter_8bit(x7, t0, t1, t7, t6); \
filter_8bit(x3, t0, t1, t7, t6); \
filter_8bit(x6, t0, t1, t7, t6); \
\
/* postfilter sbox 3 */ \
- vmovdqa .Lpost_tf_lo_s2, t4; \
- vmovdqa .Lpost_tf_hi_s2, t5; \
+ vmovdqa .Lpost_tf_lo_s2(%rip), t4; \
+ vmovdqa .Lpost_tf_hi_s2(%rip), t5; \
filter_8bit(x2, t2, t3, t7, t6); \
filter_8bit(x5, t2, t3, t7, t6); \
\
@@ -444,7 +444,7 @@ ENDPROC(roundsm16_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
transpose_4x4(c0, c1, c2, c3, a0, a1); \
transpose_4x4(d0, d1, d2, d3, a0, a1); \
\
- vmovdqu .Lshufb_16x16b, a0; \
+ vmovdqu .Lshufb_16x16b(%rip), a0; \
vmovdqu st1, a1; \
vpshufb a0, a2, a2; \
vpshufb a0, a3, a3; \
@@ -483,7 +483,7 @@ ENDPROC(roundsm16_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
#define inpack16_pre(x0, x1, x2, x3, x4, x5, x6, x7, y0, y1, y2, y3, y4, y5, \
y6, y7, rio, key) \
vmovq key, x0; \
- vpshufb .Lpack_bswap, x0, x0; \
+ vpshufb .Lpack_bswap(%rip), x0, x0; \
\
vpxor 0 * 16(rio), x0, y7; \
vpxor 1 * 16(rio), x0, y6; \
@@ -534,7 +534,7 @@ ENDPROC(roundsm16_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
vmovdqu x0, stack_tmp0; \
\
vmovq key, x0; \
- vpshufb .Lpack_bswap, x0, x0; \
+ vpshufb .Lpack_bswap(%rip), x0, x0; \
\
vpxor x0, y7, y7; \
vpxor x0, y6, y6; \
@@ -1017,7 +1017,7 @@ ENTRY(camellia_ctr_16way)
subq $(16 * 16), %rsp;
movq %rsp, %rax;
- vmovdqa .Lbswap128_mask, %xmm14;
+ vmovdqa .Lbswap128_mask(%rip), %xmm14;
/* load IV and byteswap */
vmovdqu (%rcx), %xmm0;
@@ -1066,7 +1066,7 @@ ENTRY(camellia_ctr_16way)
/* inpack16_pre: */
vmovq (key_table)(CTX), %xmm15;
- vpshufb .Lpack_bswap, %xmm15, %xmm15;
+ vpshufb .Lpack_bswap(%rip), %xmm15, %xmm15;
vpxor %xmm0, %xmm15, %xmm0;
vpxor %xmm1, %xmm15, %xmm1;
vpxor %xmm2, %xmm15, %xmm2;
@@ -1134,7 +1134,7 @@ camellia_xts_crypt_16way:
subq $(16 * 16), %rsp;
movq %rsp, %rax;
- vmovdqa .Lxts_gf128mul_and_shl1_mask, %xmm14;
+ vmovdqa .Lxts_gf128mul_and_shl1_mask(%rip), %xmm14;
/* load IV */
vmovdqu (%rcx), %xmm0;
@@ -1210,7 +1210,7 @@ camellia_xts_crypt_16way:
/* inpack16_pre: */
vmovq (key_table)(CTX, %r8, 8), %xmm15;
- vpshufb .Lpack_bswap, %xmm15, %xmm15;
+ vpshufb .Lpack_bswap(%rip), %xmm15, %xmm15;
vpxor 0 * 16(%rax), %xmm15, %xmm0;
vpxor %xmm1, %xmm15, %xmm1;
vpxor %xmm2, %xmm15, %xmm2;
@@ -1265,7 +1265,7 @@ ENTRY(camellia_xts_enc_16way)
*/
xorl %r8d, %r8d; /* input whitening key, 0 for enc */
- leaq __camellia_enc_blk16, %r9;
+ leaq __camellia_enc_blk16(%rip), %r9;
jmp camellia_xts_crypt_16way;
ENDPROC(camellia_xts_enc_16way)
@@ -1283,7 +1283,7 @@ ENTRY(camellia_xts_dec_16way)
movl $24, %eax;
cmovel %eax, %r8d; /* input whitening key, last for dec */
- leaq __camellia_dec_blk16, %r9;
+ leaq __camellia_dec_blk16(%rip), %r9;
jmp camellia_xts_crypt_16way;
ENDPROC(camellia_xts_dec_16way)
diff --git a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
index b66bbfa62f50..11bbaa1cd4a7 100644
--- a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
@@ -70,12 +70,12 @@
/* \
* S-function with AES subbytes \
*/ \
- vbroadcasti128 .Linv_shift_row, t4; \
- vpbroadcastd .L0f0f0f0f, t7; \
- vbroadcasti128 .Lpre_tf_lo_s1, t5; \
- vbroadcasti128 .Lpre_tf_hi_s1, t6; \
- vbroadcasti128 .Lpre_tf_lo_s4, t2; \
- vbroadcasti128 .Lpre_tf_hi_s4, t3; \
+ vbroadcasti128 .Linv_shift_row(%rip), t4; \
+ vpbroadcastd .L0f0f0f0f(%rip), t7; \
+ vbroadcasti128 .Lpre_tf_lo_s1(%rip), t5; \
+ vbroadcasti128 .Lpre_tf_hi_s1(%rip), t6; \
+ vbroadcasti128 .Lpre_tf_lo_s4(%rip), t2; \
+ vbroadcasti128 .Lpre_tf_hi_s4(%rip), t3; \
\
/* AES inverse shift rows */ \
vpshufb t4, x0, x0; \
@@ -121,8 +121,8 @@
vinserti128 $1, t2##_x, x6, x6; \
vextracti128 $1, x1, t3##_x; \
vextracti128 $1, x4, t2##_x; \
- vbroadcasti128 .Lpost_tf_lo_s1, t0; \
- vbroadcasti128 .Lpost_tf_hi_s1, t1; \
+ vbroadcasti128 .Lpost_tf_lo_s1(%rip), t0; \
+ vbroadcasti128 .Lpost_tf_hi_s1(%rip), t1; \
vaesenclast t4##_x, x2##_x, x2##_x; \
vaesenclast t4##_x, t6##_x, t6##_x; \
vinserti128 $1, t6##_x, x2, x2; \
@@ -137,16 +137,16 @@
vinserti128 $1, t2##_x, x4, x4; \
\
/* postfilter sboxes 1 and 4 */ \
- vbroadcasti128 .Lpost_tf_lo_s3, t2; \
- vbroadcasti128 .Lpost_tf_hi_s3, t3; \
+ vbroadcasti128 .Lpost_tf_lo_s3(%rip), t2; \
+ vbroadcasti128 .Lpost_tf_hi_s3(%rip), t3; \
filter_8bit(x0, t0, t1, t7, t6); \
filter_8bit(x7, t0, t1, t7, t6); \
filter_8bit(x3, t0, t1, t7, t6); \
filter_8bit(x6, t0, t1, t7, t6); \
\
/* postfilter sbox 3 */ \
- vbroadcasti128 .Lpost_tf_lo_s2, t4; \
- vbroadcasti128 .Lpost_tf_hi_s2, t5; \
+ vbroadcasti128 .Lpost_tf_lo_s2(%rip), t4; \
+ vbroadcasti128 .Lpost_tf_hi_s2(%rip), t5; \
filter_8bit(x2, t2, t3, t7, t6); \
filter_8bit(x5, t2, t3, t7, t6); \
\
@@ -483,7 +483,7 @@ ENDPROC(roundsm32_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
transpose_4x4(c0, c1, c2, c3, a0, a1); \
transpose_4x4(d0, d1, d2, d3, a0, a1); \
\
- vbroadcasti128 .Lshufb_16x16b, a0; \
+ vbroadcasti128 .Lshufb_16x16b(%rip), a0; \
vmovdqu st1, a1; \
vpshufb a0, a2, a2; \
vpshufb a0, a3, a3; \
@@ -522,7 +522,7 @@ ENDPROC(roundsm32_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
#define inpack32_pre(x0, x1, x2, x3, x4, x5, x6, x7, y0, y1, y2, y3, y4, y5, \
y6, y7, rio, key) \
vpbroadcastq key, x0; \
- vpshufb .Lpack_bswap, x0, x0; \
+ vpshufb .Lpack_bswap(%rip), x0, x0; \
\
vpxor 0 * 32(rio), x0, y7; \
vpxor 1 * 32(rio), x0, y6; \
@@ -573,7 +573,7 @@ ENDPROC(roundsm32_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
vmovdqu x0, stack_tmp0; \
\
vpbroadcastq key, x0; \
- vpshufb .Lpack_bswap, x0, x0; \
+ vpshufb .Lpack_bswap(%rip), x0, x0; \
\
vpxor x0, y7, y7; \
vpxor x0, y6, y6; \
@@ -1113,7 +1113,7 @@ ENTRY(camellia_ctr_32way)
vmovdqu (%rcx), %xmm0;
vmovdqa %xmm0, %xmm1;
inc_le128(%xmm0, %xmm15, %xmm14);
- vbroadcasti128 .Lbswap128_mask, %ymm14;
+ vbroadcasti128 .Lbswap128_mask(%rip), %ymm14;
vinserti128 $1, %xmm0, %ymm1, %ymm0;
vpshufb %ymm14, %ymm0, %ymm13;
vmovdqu %ymm13, 15 * 32(%rax);
@@ -1159,7 +1159,7 @@ ENTRY(camellia_ctr_32way)
/* inpack32_pre: */
vpbroadcastq (key_table)(CTX), %ymm15;
- vpshufb .Lpack_bswap, %ymm15, %ymm15;
+ vpshufb .Lpack_bswap(%rip), %ymm15, %ymm15;
vpxor %ymm0, %ymm15, %ymm0;
vpxor %ymm1, %ymm15, %ymm1;
vpxor %ymm2, %ymm15, %ymm2;
@@ -1243,13 +1243,13 @@ camellia_xts_crypt_32way:
subq $(16 * 32), %rsp;
movq %rsp, %rax;
- vbroadcasti128 .Lxts_gf128mul_and_shl1_mask_0, %ymm12;
+ vbroadcasti128 .Lxts_gf128mul_and_shl1_mask_0(%rip), %ymm12;
/* load IV and construct second IV */
vmovdqu (%rcx), %xmm0;
vmovdqa %xmm0, %xmm15;
gf128mul_x_ble(%xmm0, %xmm12, %xmm13);
- vbroadcasti128 .Lxts_gf128mul_and_shl1_mask_1, %ymm13;
+ vbroadcasti128 .Lxts_gf128mul_and_shl1_mask_1(%rip), %ymm13;
vinserti128 $1, %xmm0, %ymm15, %ymm0;
vpxor 0 * 32(%rdx), %ymm0, %ymm15;
vmovdqu %ymm15, 15 * 32(%rax);
@@ -1326,7 +1326,7 @@ camellia_xts_crypt_32way:
/* inpack32_pre: */
vpbroadcastq (key_table)(CTX, %r8, 8), %ymm15;
- vpshufb .Lpack_bswap, %ymm15, %ymm15;
+ vpshufb .Lpack_bswap(%rip), %ymm15, %ymm15;
vpxor 0 * 32(%rax), %ymm15, %ymm0;
vpxor %ymm1, %ymm15, %ymm1;
vpxor %ymm2, %ymm15, %ymm2;
@@ -1384,7 +1384,7 @@ ENTRY(camellia_xts_enc_32way)
xorl %r8d, %r8d; /* input whitening key, 0 for enc */
- leaq __camellia_enc_blk32, %r9;
+ leaq __camellia_enc_blk32(%rip), %r9;
jmp camellia_xts_crypt_32way;
ENDPROC(camellia_xts_enc_32way)
@@ -1402,7 +1402,7 @@ ENTRY(camellia_xts_dec_32way)
movl $24, %eax;
cmovel %eax, %r8d; /* input whitening key, last for dec */
- leaq __camellia_dec_blk32, %r9;
+ leaq __camellia_dec_blk32(%rip), %r9;
jmp camellia_xts_crypt_32way;
ENDPROC(camellia_xts_dec_32way)
diff --git a/arch/x86/crypto/camellia-x86_64-asm_64.S b/arch/x86/crypto/camellia-x86_64-asm_64.S
index 95ba6956a7f6..ef1137406959 100644
--- a/arch/x86/crypto/camellia-x86_64-asm_64.S
+++ b/arch/x86/crypto/camellia-x86_64-asm_64.S
@@ -92,11 +92,13 @@
#define RXORbl %r9b
#define xor2ror16(T0, T1, tmp1, tmp2, ab, dst) \
+ leaq T0(%rip), tmp1; \
movzbl ab ## bl, tmp2 ## d; \
+ xorq (tmp1, tmp2, 8), dst; \
+ leaq T1(%rip), tmp2; \
movzbl ab ## bh, tmp1 ## d; \
- rorq $16, ab; \
- xorq T0(, tmp2, 8), dst; \
- xorq T1(, tmp1, 8), dst;
+ xorq (tmp2, tmp1, 8), dst; \
+ rorq $16, ab;
/**********************************************************************
1-way camellia
diff --git a/arch/x86/crypto/cast5-avx-x86_64-asm_64.S b/arch/x86/crypto/cast5-avx-x86_64-asm_64.S
index 86107c961bb4..64eb5c87d04a 100644
--- a/arch/x86/crypto/cast5-avx-x86_64-asm_64.S
+++ b/arch/x86/crypto/cast5-avx-x86_64-asm_64.S
@@ -98,16 +98,20 @@
#define lookup_32bit(src, dst, op1, op2, op3, interleave_op, il_reg) \
- movzbl src ## bh, RID1d; \
- movzbl src ## bl, RID2d; \
- shrq $16, src; \
- movl s1(, RID1, 4), dst ## d; \
- op1 s2(, RID2, 4), dst ## d; \
- movzbl src ## bh, RID1d; \
- movzbl src ## bl, RID2d; \
- interleave_op(il_reg); \
- op2 s3(, RID1, 4), dst ## d; \
- op3 s4(, RID2, 4), dst ## d;
+ movzbl src ## bh, RID1d; \
+ leaq s1(%rip), RID2; \
+ movl (RID2, RID1, 4), dst ## d; \
+ movzbl src ## bl, RID2d; \
+ leaq s2(%rip), RID1; \
+ op1 (RID1, RID2, 4), dst ## d; \
+ shrq $16, src; \
+ movzbl src ## bh, RID1d; \
+ leaq s3(%rip), RID2; \
+ op2 (RID2, RID1, 4), dst ## d; \
+ movzbl src ## bl, RID2d; \
+ leaq s4(%rip), RID1; \
+ op3 (RID1, RID2, 4), dst ## d; \
+ interleave_op(il_reg);
#define dummy(d) /* do nothing */
@@ -166,15 +170,15 @@
subround(l ## 3, r ## 3, l ## 4, r ## 4, f);
#define enc_preload_rkr() \
- vbroadcastss .L16_mask, RKR; \
+ vbroadcastss .L16_mask(%rip), RKR; \
/* add 16-bit rotation to key rotations (mod 32) */ \
vpxor kr(CTX), RKR, RKR;
#define dec_preload_rkr() \
- vbroadcastss .L16_mask, RKR; \
+ vbroadcastss .L16_mask(%rip), RKR; \
/* add 16-bit rotation to key rotations (mod 32) */ \
vpxor kr(CTX), RKR, RKR; \
- vpshufb .Lbswap128_mask, RKR, RKR;
+ vpshufb .Lbswap128_mask(%rip), RKR, RKR;
#define transpose_2x4(x0, x1, t0, t1) \
vpunpckldq x1, x0, t0; \
@@ -251,9 +255,9 @@ __cast5_enc_blk16:
movq %rdi, CTX;
- vmovdqa .Lbswap_mask, RKM;
- vmovd .Lfirst_mask, R1ST;
- vmovd .L32_mask, R32;
+ vmovdqa .Lbswap_mask(%rip), RKM;
+ vmovd .Lfirst_mask(%rip), R1ST;
+ vmovd .L32_mask(%rip), R32;
enc_preload_rkr();
inpack_blocks(RL1, RR1, RTMP, RX, RKM);
@@ -287,7 +291,7 @@ __cast5_enc_blk16:
popq %rbx;
popq %r15;
- vmovdqa .Lbswap_mask, RKM;
+ vmovdqa .Lbswap_mask(%rip), RKM;
outunpack_blocks(RR1, RL1, RTMP, RX, RKM);
outunpack_blocks(RR2, RL2, RTMP, RX, RKM);
@@ -325,9 +329,9 @@ __cast5_dec_blk16:
movq %rdi, CTX;
- vmovdqa .Lbswap_mask, RKM;
- vmovd .Lfirst_mask, R1ST;
- vmovd .L32_mask, R32;
+ vmovdqa .Lbswap_mask(%rip), RKM;
+ vmovd .Lfirst_mask(%rip), R1ST;
+ vmovd .L32_mask(%rip), R32;
dec_preload_rkr();
inpack_blocks(RL1, RR1, RTMP, RX, RKM);
@@ -358,7 +362,7 @@ __cast5_dec_blk16:
round(RL, RR, 1, 2);
round(RR, RL, 0, 1);
- vmovdqa .Lbswap_mask, RKM;
+ vmovdqa .Lbswap_mask(%rip), RKM;
popq %rbx;
popq %r15;
@@ -521,8 +525,8 @@ ENTRY(cast5_ctr_16way)
vpcmpeqd RKR, RKR, RKR;
vpaddq RKR, RKR, RKR; /* low: -2, high: -2 */
- vmovdqa .Lbswap_iv_mask, R1ST;
- vmovdqa .Lbswap128_mask, RKM;
+ vmovdqa .Lbswap_iv_mask(%rip), R1ST;
+ vmovdqa .Lbswap128_mask(%rip), RKM;
/* load IV and byteswap */
vmovq (%rcx), RX;
diff --git a/arch/x86/crypto/cast6-avx-x86_64-asm_64.S b/arch/x86/crypto/cast6-avx-x86_64-asm_64.S
index 7f30b6f0d72c..da1b7e4a23e4 100644
--- a/arch/x86/crypto/cast6-avx-x86_64-asm_64.S
+++ b/arch/x86/crypto/cast6-avx-x86_64-asm_64.S
@@ -98,16 +98,20 @@
#define lookup_32bit(src, dst, op1, op2, op3, interleave_op, il_reg) \
- movzbl src ## bh, RID1d; \
- movzbl src ## bl, RID2d; \
- shrq $16, src; \
- movl s1(, RID1, 4), dst ## d; \
- op1 s2(, RID2, 4), dst ## d; \
- movzbl src ## bh, RID1d; \
- movzbl src ## bl, RID2d; \
- interleave_op(il_reg); \
- op2 s3(, RID1, 4), dst ## d; \
- op3 s4(, RID2, 4), dst ## d;
+ movzbl src ## bh, RID1d; \
+ leaq s1(%rip), RID2; \
+ movl (RID2, RID1, 4), dst ## d; \
+ movzbl src ## bl, RID2d; \
+ leaq s2(%rip), RID1; \
+ op1 (RID1, RID2, 4), dst ## d; \
+ shrq $16, src; \
+ movzbl src ## bh, RID1d; \
+ leaq s3(%rip), RID2; \
+ op2 (RID2, RID1, 4), dst ## d; \
+ movzbl src ## bl, RID2d; \
+ leaq s4(%rip), RID1; \
+ op3 (RID1, RID2, 4), dst ## d; \
+ interleave_op(il_reg);
#define dummy(d) /* do nothing */
@@ -190,10 +194,10 @@
qop(RD, RC, 1);
#define shuffle(mask) \
- vpshufb mask, RKR, RKR;
+ vpshufb mask(%rip), RKR, RKR;
#define preload_rkr(n, do_mask, mask) \
- vbroadcastss .L16_mask, RKR; \
+ vbroadcastss .L16_mask(%rip), RKR; \
/* add 16-bit rotation to key rotations (mod 32) */ \
vpxor (kr+n*16)(CTX), RKR, RKR; \
do_mask(mask);
@@ -275,9 +279,9 @@ __cast6_enc_blk8:
movq %rdi, CTX;
- vmovdqa .Lbswap_mask, RKM;
- vmovd .Lfirst_mask, R1ST;
- vmovd .L32_mask, R32;
+ vmovdqa .Lbswap_mask(%rip), RKM;
+ vmovd .Lfirst_mask(%rip), R1ST;
+ vmovd .L32_mask(%rip), R32;
inpack_blocks(RA1, RB1, RC1, RD1, RTMP, RX, RKRF, RKM);
inpack_blocks(RA2, RB2, RC2, RD2, RTMP, RX, RKRF, RKM);
@@ -301,7 +305,7 @@ __cast6_enc_blk8:
popq %rbx;
popq %r15;
- vmovdqa .Lbswap_mask, RKM;
+ vmovdqa .Lbswap_mask(%rip), RKM;
outunpack_blocks(RA1, RB1, RC1, RD1, RTMP, RX, RKRF, RKM);
outunpack_blocks(RA2, RB2, RC2, RD2, RTMP, RX, RKRF, RKM);
@@ -323,9 +327,9 @@ __cast6_dec_blk8:
movq %rdi, CTX;
- vmovdqa .Lbswap_mask, RKM;
- vmovd .Lfirst_mask, R1ST;
- vmovd .L32_mask, R32;
+ vmovdqa .Lbswap_mask(%rip), RKM;
+ vmovd .Lfirst_mask(%rip), R1ST;
+ vmovd .L32_mask(%rip), R32;
inpack_blocks(RA1, RB1, RC1, RD1, RTMP, RX, RKRF, RKM);
inpack_blocks(RA2, RB2, RC2, RD2, RTMP, RX, RKRF, RKM);
@@ -349,7 +353,7 @@ __cast6_dec_blk8:
popq %rbx;
popq %r15;
- vmovdqa .Lbswap_mask, RKM;
+ vmovdqa .Lbswap_mask(%rip), RKM;
outunpack_blocks(RA1, RB1, RC1, RD1, RTMP, RX, RKRF, RKM);
outunpack_blocks(RA2, RB2, RC2, RD2, RTMP, RX, RKRF, RKM);
diff --git a/arch/x86/crypto/des3_ede-asm_64.S b/arch/x86/crypto/des3_ede-asm_64.S
index 8e49ce117494..4bbd3ec78df5 100644
--- a/arch/x86/crypto/des3_ede-asm_64.S
+++ b/arch/x86/crypto/des3_ede-asm_64.S
@@ -138,21 +138,29 @@
movzbl RW0bl, RT2d; \
movzbl RW0bh, RT3d; \
shrq $16, RW0; \
- movq s8(, RT0, 8), RT0; \
- xorq s6(, RT1, 8), to; \
+ leaq s8(%rip), RW1; \
+ movq (RW1, RT0, 8), RT0; \
+ leaq s6(%rip), RW1; \
+ xorq (RW1, RT1, 8), to; \
movzbl RW0bl, RL1d; \
movzbl RW0bh, RT1d; \
shrl $16, RW0d; \
- xorq s4(, RT2, 8), RT0; \
- xorq s2(, RT3, 8), to; \
+ leaq s4(%rip), RW1; \
+ xorq (RW1, RT2, 8), RT0; \
+ leaq s2(%rip), RW1; \
+ xorq (RW1, RT3, 8), to; \
movzbl RW0bl, RT2d; \
movzbl RW0bh, RT3d; \
- xorq s7(, RL1, 8), RT0; \
- xorq s5(, RT1, 8), to; \
- xorq s3(, RT2, 8), RT0; \
+ leaq s7(%rip), RW1; \
+ xorq (RW1, RL1, 8), RT0; \
+ leaq s5(%rip), RW1; \
+ xorq (RW1, RT1, 8), to; \
+ leaq s3(%rip), RW1; \
+ xorq (RW1, RT2, 8), RT0; \
load_next_key(n, RW0); \
xorq RT0, to; \
- xorq s1(, RT3, 8), to; \
+ leaq s1(%rip), RW1; \
+ xorq (RW1, RT3, 8), to; \
#define load_next_key(n, RWx) \
movq (((n) + 1) * 8)(CTX), RWx;
@@ -364,65 +372,89 @@ ENDPROC(des3_ede_x86_64_crypt_blk)
movzbl RW0bl, RT3d; \
movzbl RW0bh, RT1d; \
shrq $16, RW0; \
- xorq s8(, RT3, 8), to##0; \
- xorq s6(, RT1, 8), to##0; \
+ leaq s8(%rip), RT2; \
+ xorq (RT2, RT3, 8), to##0; \
+ leaq s6(%rip), RT2; \
+ xorq (RT2, RT1, 8), to##0; \
movzbl RW0bl, RT3d; \
movzbl RW0bh, RT1d; \
shrq $16, RW0; \
- xorq s4(, RT3, 8), to##0; \
- xorq s2(, RT1, 8), to##0; \
+ leaq s4(%rip), RT2; \
+ xorq (RT2, RT3, 8), to##0; \
+ leaq s2(%rip), RT2; \
+ xorq (RT2, RT1, 8), to##0; \
movzbl RW0bl, RT3d; \
movzbl RW0bh, RT1d; \
shrl $16, RW0d; \
- xorq s7(, RT3, 8), to##0; \
- xorq s5(, RT1, 8), to##0; \
+ leaq s7(%rip), RT2; \
+ xorq (RT2, RT3, 8), to##0; \
+ leaq s5(%rip), RT2; \
+ xorq (RT2, RT1, 8), to##0; \
movzbl RW0bl, RT3d; \
movzbl RW0bh, RT1d; \
load_next_key(n, RW0); \
- xorq s3(, RT3, 8), to##0; \
- xorq s1(, RT1, 8), to##0; \
+ leaq s3(%rip), RT2; \
+ xorq (RT2, RT3, 8), to##0; \
+ leaq s1(%rip), RT2; \
+ xorq (RT2, RT1, 8), to##0; \
xorq from##1, RW1; \
movzbl RW1bl, RT3d; \
movzbl RW1bh, RT1d; \
shrq $16, RW1; \
- xorq s8(, RT3, 8), to##1; \
- xorq s6(, RT1, 8), to##1; \
+ leaq s8(%rip), RT2; \
+ xorq (RT2, RT3, 8), to##1; \
+ leaq s6(%rip), RT2; \
+ xorq (RT2, RT1, 8), to##1; \
movzbl RW1bl, RT3d; \
movzbl RW1bh, RT1d; \
shrq $16, RW1; \
- xorq s4(, RT3, 8), to##1; \
- xorq s2(, RT1, 8), to##1; \
+ leaq s4(%rip), RT2; \
+ xorq (RT2, RT3, 8), to##1; \
+ leaq s2(%rip), RT2; \
+ xorq (RT2, RT1, 8), to##1; \
movzbl RW1bl, RT3d; \
movzbl RW1bh, RT1d; \
shrl $16, RW1d; \
- xorq s7(, RT3, 8), to##1; \
- xorq s5(, RT1, 8), to##1; \
+ leaq s7(%rip), RT2; \
+ xorq (RT2, RT3, 8), to##1; \
+ leaq s5(%rip), RT2; \
+ xorq (RT2, RT1, 8), to##1; \
movzbl RW1bl, RT3d; \
movzbl RW1bh, RT1d; \
do_movq(RW0, RW1); \
- xorq s3(, RT3, 8), to##1; \
- xorq s1(, RT1, 8), to##1; \
+ leaq s3(%rip), RT2; \
+ xorq (RT2, RT3, 8), to##1; \
+ leaq s1(%rip), RT2; \
+ xorq (RT2, RT1, 8), to##1; \
xorq from##2, RW2; \
movzbl RW2bl, RT3d; \
movzbl RW2bh, RT1d; \
shrq $16, RW2; \
- xorq s8(, RT3, 8), to##2; \
- xorq s6(, RT1, 8), to##2; \
+ leaq s8(%rip), RT2; \
+ xorq (RT2, RT3, 8), to##2; \
+ leaq s6(%rip), RT2; \
+ xorq (RT2, RT1, 8), to##2; \
movzbl RW2bl, RT3d; \
movzbl RW2bh, RT1d; \
shrq $16, RW2; \
- xorq s4(, RT3, 8), to##2; \
- xorq s2(, RT1, 8), to##2; \
+ leaq s4(%rip), RT2; \
+ xorq (RT2, RT3, 8), to##2; \
+ leaq s2(%rip), RT2; \
+ xorq (RT2, RT1, 8), to##2; \
movzbl RW2bl, RT3d; \
movzbl RW2bh, RT1d; \
shrl $16, RW2d; \
- xorq s7(, RT3, 8), to##2; \
- xorq s5(, RT1, 8), to##2; \
+ leaq s7(%rip), RT2; \
+ xorq (RT2, RT3, 8), to##2; \
+ leaq s5(%rip), RT2; \
+ xorq (RT2, RT1, 8), to##2; \
movzbl RW2bl, RT3d; \
movzbl RW2bh, RT1d; \
do_movq(RW0, RW2); \
- xorq s3(, RT3, 8), to##2; \
- xorq s1(, RT1, 8), to##2;
+ leaq s3(%rip), RT2; \
+ xorq (RT2, RT3, 8), to##2; \
+ leaq s1(%rip), RT2; \
+ xorq (RT2, RT1, 8), to##2;
#define __movq(src, dst) \
movq src, dst;
diff --git a/arch/x86/crypto/ghash-clmulni-intel_asm.S b/arch/x86/crypto/ghash-clmulni-intel_asm.S
index f94375a8dcd1..d56a281221fb 100644
--- a/arch/x86/crypto/ghash-clmulni-intel_asm.S
+++ b/arch/x86/crypto/ghash-clmulni-intel_asm.S
@@ -97,7 +97,7 @@ ENTRY(clmul_ghash_mul)
FRAME_BEGIN
movups (%rdi), DATA
movups (%rsi), SHASH
- movaps .Lbswap_mask, BSWAP
+ movaps .Lbswap_mask(%rip), BSWAP
PSHUFB_XMM BSWAP DATA
call __clmul_gf128mul_ble
PSHUFB_XMM BSWAP DATA
@@ -114,7 +114,7 @@ ENTRY(clmul_ghash_update)
FRAME_BEGIN
cmp $16, %rdx
jb .Lupdate_just_ret # check length
- movaps .Lbswap_mask, BSWAP
+ movaps .Lbswap_mask(%rip), BSWAP
movups (%rdi), DATA
movups (%rcx), SHASH
PSHUFB_XMM BSWAP DATA
diff --git a/arch/x86/crypto/glue_helper-asm-avx.S b/arch/x86/crypto/glue_helper-asm-avx.S
index 02ee2308fb38..8a49ab1699ef 100644
--- a/arch/x86/crypto/glue_helper-asm-avx.S
+++ b/arch/x86/crypto/glue_helper-asm-avx.S
@@ -54,7 +54,7 @@
#define load_ctr_8way(iv, bswap, x0, x1, x2, x3, x4, x5, x6, x7, t0, t1, t2) \
vpcmpeqd t0, t0, t0; \
vpsrldq $8, t0, t0; /* low: -1, high: 0 */ \
- vmovdqa bswap, t1; \
+ vmovdqa bswap(%rip), t1; \
\
/* load IV and byteswap */ \
vmovdqu (iv), x7; \
@@ -99,7 +99,7 @@
#define load_xts_8way(iv, src, dst, x0, x1, x2, x3, x4, x5, x6, x7, tiv, t0, \
t1, xts_gf128mul_and_shl1_mask) \
- vmovdqa xts_gf128mul_and_shl1_mask, t0; \
+ vmovdqa xts_gf128mul_and_shl1_mask(%rip), t0; \
\
/* load IV */ \
vmovdqu (iv), tiv; \
diff --git a/arch/x86/crypto/glue_helper-asm-avx2.S b/arch/x86/crypto/glue_helper-asm-avx2.S
index a53ac11dd385..e04c80467bd2 100644
--- a/arch/x86/crypto/glue_helper-asm-avx2.S
+++ b/arch/x86/crypto/glue_helper-asm-avx2.S
@@ -67,7 +67,7 @@
vmovdqu (iv), t2x; \
vmovdqa t2x, t3x; \
inc_le128(t2x, t0x, t1x); \
- vbroadcasti128 bswap, t1; \
+ vbroadcasti128 bswap(%rip), t1; \
vinserti128 $1, t2x, t3, t2; /* ab: le0 ; cd: le1 */ \
vpshufb t1, t2, x0; \
\
@@ -124,13 +124,13 @@
tivx, t0, t0x, t1, t1x, t2, t2x, t3, \
xts_gf128mul_and_shl1_mask_0, \
xts_gf128mul_and_shl1_mask_1) \
- vbroadcasti128 xts_gf128mul_and_shl1_mask_0, t1; \
+ vbroadcasti128 xts_gf128mul_and_shl1_mask_0(%rip), t1; \
\
/* load IV and construct second IV */ \
vmovdqu (iv), tivx; \
vmovdqa tivx, t0x; \
gf128mul_x_ble(tivx, t1x, t2x); \
- vbroadcasti128 xts_gf128mul_and_shl1_mask_1, t2; \
+ vbroadcasti128 xts_gf128mul_and_shl1_mask_1(%rip), t2; \
vinserti128 $1, tivx, t0, tiv; \
vpxor (0*32)(src), tiv, x0; \
vmovdqu tiv, (0*32)(dst); \
diff --git a/arch/x86/crypto/sha256-avx2-asm.S b/arch/x86/crypto/sha256-avx2-asm.S
index 1420db15dcdd..2ced4b2f6c76 100644
--- a/arch/x86/crypto/sha256-avx2-asm.S
+++ b/arch/x86/crypto/sha256-avx2-asm.S
@@ -588,37 +588,42 @@ last_block_enter:
mov INP, _INP(%rsp)
## schedule 48 input dwords, by doing 3 rounds of 12 each
- xor SRND, SRND
+ leaq K256(%rip), SRND
+ ## loop1 upper bound
+ leaq K256+3*4*32(%rip), INP
.align 16
loop1:
- vpaddd K256+0*32(SRND), X0, XFER
+ vpaddd 0*32(SRND), X0, XFER
vmovdqa XFER, 0*32+_XFER(%rsp, SRND)
FOUR_ROUNDS_AND_SCHED _XFER + 0*32
- vpaddd K256+1*32(SRND), X0, XFER
+ vpaddd 1*32(SRND), X0, XFER
vmovdqa XFER, 1*32+_XFER(%rsp, SRND)
FOUR_ROUNDS_AND_SCHED _XFER + 1*32
- vpaddd K256+2*32(SRND), X0, XFER
+ vpaddd 2*32(SRND), X0, XFER
vmovdqa XFER, 2*32+_XFER(%rsp, SRND)
FOUR_ROUNDS_AND_SCHED _XFER + 2*32
- vpaddd K256+3*32(SRND), X0, XFER
+ vpaddd 3*32(SRND), X0, XFER
vmovdqa XFER, 3*32+_XFER(%rsp, SRND)
FOUR_ROUNDS_AND_SCHED _XFER + 3*32
add $4*32, SRND
- cmp $3*4*32, SRND
+ cmp INP, SRND
jb loop1
+ ## loop2 upper bound
+ leaq K256+4*4*32(%rip), INP
+
loop2:
## Do last 16 rounds with no scheduling
- vpaddd K256+0*32(SRND), X0, XFER
+ vpaddd 0*32(SRND), X0, XFER
vmovdqa XFER, 0*32+_XFER(%rsp, SRND)
DO_4ROUNDS _XFER + 0*32
- vpaddd K256+1*32(SRND), X1, XFER
+ vpaddd 1*32(SRND), X1, XFER
vmovdqa XFER, 1*32+_XFER(%rsp, SRND)
DO_4ROUNDS _XFER + 1*32
add $2*32, SRND
@@ -626,7 +631,7 @@ loop2:
vmovdqa X2, X0
vmovdqa X3, X1
- cmp $4*4*32, SRND
+ cmp INP, SRND
jb loop2
mov _CTX(%rsp), CTX
--
2.17.0.441.gb46fe60e1d-goog
^ permalink raw reply related
* [PATCH v3 02/27] x86: Use symbol name on bug table for PIE support
From: Thomas Garnier via Virtualization @ 2018-05-23 19:53 UTC (permalink / raw)
To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf,
Greg Kroah-Hartman, Thomas Garnier, Philippe Ombredanne,
Kate Stewart, Arnaldo Carvalho de Melo, Yonghong Song,
Andrey Ryabinin, Kees Cook, Tom Lendacky, Kirill A . Shutemov,
Andy Lutomirski, Dominik Brodowski, Borislav Petkov,
Borislav Petkov, Rafael J . Wysocki
Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
virtualization, linux-sparse, linux-crypto, kernel-hardening,
xen-devel
In-Reply-To: <20180523195421.180248-1-thgarnie@google.com>
Replace the %c constraint with %P. The %c is incompatible with PIE
because it implies an immediate value whereas %P reference a symbol.
Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.
Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
arch/x86/include/asm/bug.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/bug.h b/arch/x86/include/asm/bug.h
index 6804d6642767..3d690a4abf50 100644
--- a/arch/x86/include/asm/bug.h
+++ b/arch/x86/include/asm/bug.h
@@ -35,7 +35,7 @@ do { \
asm volatile("1:\t" ins "\n" \
".pushsection __bug_table,\"aw\"\n" \
"2:\t" __BUG_REL(1b) "\t# bug_entry::bug_addr\n" \
- "\t" __BUG_REL(%c0) "\t# bug_entry::file\n" \
+ "\t" __BUG_REL(%P0) "\t# bug_entry::file\n" \
"\t.word %c1" "\t# bug_entry::line\n" \
"\t.word %c2" "\t# bug_entry::flags\n" \
"\t.org 2b+%c3\n" \
--
2.17.0.441.gb46fe60e1d-goog
^ permalink raw reply related
* [PATCH v3 03/27] x86: Use symbol name in jump table for PIE support
From: Thomas Garnier via Virtualization @ 2018-05-23 19:53 UTC (permalink / raw)
To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf,
Greg Kroah-Hartman, Thomas Garnier, Philippe Ombredanne,
Kate Stewart, Arnaldo Carvalho de Melo, Yonghong Song,
Andrey Ryabinin, Kees Cook, Tom Lendacky, Kirill A . Shutemov,
Andy Lutomirski, Dominik Brodowski, Borislav Petkov,
Borislav Petkov, Rafael J . Wysocki
Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
virtualization, linux-sparse, linux-crypto, kernel-hardening,
xen-devel
In-Reply-To: <20180523195421.180248-1-thgarnie@google.com>
Replace the %c constraint with %P. The %c is incompatible with PIE
because it implies an immediate value whereas %P reference a symbol.
Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.
Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
arch/x86/include/asm/jump_label.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h
index 8c0de4282659..dfdcdc39604a 100644
--- a/arch/x86/include/asm/jump_label.h
+++ b/arch/x86/include/asm/jump_label.h
@@ -37,9 +37,9 @@ static __always_inline bool arch_static_branch(struct static_key *key, bool bran
".byte " __stringify(STATIC_KEY_INIT_NOP) "\n\t"
".pushsection __jump_table, \"aw\" \n\t"
_ASM_ALIGN "\n\t"
- _ASM_PTR "1b, %l[l_yes], %c0 + %c1 \n\t"
+ _ASM_PTR "1b, %l[l_yes], %P0 \n\t"
".popsection \n\t"
- : : "i" (key), "i" (branch) : : l_yes);
+ : : "X" (&((char *)key)[branch]) : : l_yes);
return false;
l_yes:
@@ -53,9 +53,9 @@ static __always_inline bool arch_static_branch_jump(struct static_key *key, bool
"2:\n\t"
".pushsection __jump_table, \"aw\" \n\t"
_ASM_ALIGN "\n\t"
- _ASM_PTR "1b, %l[l_yes], %c0 + %c1 \n\t"
+ _ASM_PTR "1b, %l[l_yes], %P0 \n\t"
".popsection \n\t"
- : : "i" (key), "i" (branch) : : l_yes);
+ : : "X" (&((char *)key)[branch]) : : l_yes);
return false;
l_yes:
--
2.17.0.441.gb46fe60e1d-goog
^ permalink raw reply related
* [PATCH v3 04/27] x86: Add macro to get symbol address for PIE support
From: Thomas Garnier via Virtualization @ 2018-05-23 19:53 UTC (permalink / raw)
To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf,
Greg Kroah-Hartman, Thomas Garnier, Philippe Ombredanne,
Kate Stewart, Arnaldo Carvalho de Melo, Yonghong Song,
Andrey Ryabinin, Kees Cook, Tom Lendacky, Kirill A . Shutemov,
Andy Lutomirski, Dominik Brodowski, Borislav Petkov,
Borislav Petkov, Rafael J . Wysocki
Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
virtualization, linux-sparse, linux-crypto, kernel-hardening,
xen-devel
In-Reply-To: <20180523195421.180248-1-thgarnie@google.com>
Add a new _ASM_MOVABS macro to fetch a symbol address. It will be used
to replace "_ASM_MOV $<symbol>, %dst" code construct that are not compatible
with PIE.
Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
arch/x86/include/asm/asm.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
index 219faaec51df..4492a35fad69 100644
--- a/arch/x86/include/asm/asm.h
+++ b/arch/x86/include/asm/asm.h
@@ -30,6 +30,7 @@
#define _ASM_ALIGN __ASM_SEL(.balign 4, .balign 8)
#define _ASM_MOV __ASM_SIZE(mov)
+#define _ASM_MOVABS __ASM_SEL(movl, movabsq)
#define _ASM_INC __ASM_SIZE(inc)
#define _ASM_DEC __ASM_SIZE(dec)
#define _ASM_ADD __ASM_SIZE(add)
--
2.17.0.441.gb46fe60e1d-goog
^ permalink raw reply related
* [PATCH v3 05/27] x86: relocate_kernel - Adapt assembly for PIE support
From: Thomas Garnier via Virtualization @ 2018-05-23 19:53 UTC (permalink / raw)
To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf,
Greg Kroah-Hartman, Thomas Garnier, Philippe Ombredanne,
Kate Stewart, Arnaldo Carvalho de Melo, Yonghong Song,
Andrey Ryabinin, Kees Cook, Tom Lendacky, Kirill A . Shutemov,
Andy Lutomirski, Dominik Brodowski, Borislav Petkov,
Borislav Petkov, Rafael J . Wysocki
Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
virtualization, linux-sparse, linux-crypto, kernel-hardening,
xen-devel
In-Reply-To: <20180523195421.180248-1-thgarnie@google.com>
Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.
Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.
Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
arch/x86/kernel/relocate_kernel_64.S | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
index 11eda21eb697..a7227dfe1a2b 100644
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -208,9 +208,11 @@ identity_mapped:
movq %rax, %cr3
lea PAGE_SIZE(%r8), %rsp
call swap_pages
- movq $virtual_mapped, %rax
- pushq %rax
- ret
+ jmp *virtual_mapped_addr(%rip)
+
+ /* Absolute value for PIE support */
+virtual_mapped_addr:
+ .quad virtual_mapped
virtual_mapped:
movq RSP(%r8), %rsp
--
2.17.0.441.gb46fe60e1d-goog
^ permalink raw reply related
* [PATCH v3 06/27] x86/entry/64: Adapt assembly for PIE support
From: Thomas Garnier via Virtualization @ 2018-05-23 19:54 UTC (permalink / raw)
To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf,
Greg Kroah-Hartman, Thomas Garnier, Philippe Ombredanne,
Kate Stewart, Arnaldo Carvalho de Melo, Yonghong Song,
Andrey Ryabinin, Kees Cook, Tom Lendacky, Kirill A . Shutemov,
Andy Lutomirski, Dominik Brodowski, Borislav Petkov,
Borislav Petkov, Rafael J . Wysocki
Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
virtualization, linux-sparse, linux-crypto, kernel-hardening,
xen-devel
In-Reply-To: <20180523195421.180248-1-thgarnie@google.com>
Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.
Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.
Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
arch/x86/entry/entry_64.S | 18 ++++++++++++------
arch/x86/kernel/relocate_kernel_64.S | 8 +++-----
2 files changed, 15 insertions(+), 11 deletions(-)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index c9648b287d7f..8638dca78191 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -191,7 +191,7 @@ ENTRY(entry_SYSCALL_64_trampoline)
* spill RDI and restore it in a second-stage trampoline.
*/
pushq %rdi
- movq $entry_SYSCALL_64_stage2, %rdi
+ movabsq $entry_SYSCALL_64_stage2, %rdi
JMP_NOSPEC %rdi
END(entry_SYSCALL_64_trampoline)
@@ -1279,7 +1279,8 @@ ENTRY(error_entry)
movl %ecx, %eax /* zero extend */
cmpq %rax, RIP+8(%rsp)
je .Lbstep_iret
- cmpq $.Lgs_change, RIP+8(%rsp)
+ leaq .Lgs_change(%rip), %rcx
+ cmpq %rcx, RIP+8(%rsp)
jne .Lerror_entry_done
/*
@@ -1484,10 +1485,10 @@ ENTRY(nmi)
* resume the outer NMI.
*/
- movq $repeat_nmi, %rdx
+ leaq repeat_nmi(%rip), %rdx
cmpq 8(%rsp), %rdx
ja 1f
- movq $end_repeat_nmi, %rdx
+ leaq end_repeat_nmi(%rip), %rdx
cmpq 8(%rsp), %rdx
ja nested_nmi_out
1:
@@ -1541,7 +1542,8 @@ nested_nmi:
pushq %rdx
pushfq
pushq $__KERNEL_CS
- pushq $repeat_nmi
+ leaq repeat_nmi(%rip), %rdx
+ pushq %rdx
/* Put stack back */
addq $(6*8), %rsp
@@ -1580,7 +1582,11 @@ first_nmi:
addq $8, (%rsp) /* Fix up RSP */
pushfq /* RFLAGS */
pushq $__KERNEL_CS /* CS */
- pushq $1f /* RIP */
+ pushq $0 /* Futur return address */
+ pushq %rax /* Save RAX */
+ leaq 1f(%rip), %rax /* RIP */
+ movq %rax, 8(%rsp) /* Put 1f on return address */
+ popq %rax /* Restore RAX */
iretq /* continues at repeat_nmi below */
UNWIND_HINT_IRET_REGS
1:
diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
index a7227dfe1a2b..0c0fc259a4e2 100644
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -208,11 +208,9 @@ identity_mapped:
movq %rax, %cr3
lea PAGE_SIZE(%r8), %rsp
call swap_pages
- jmp *virtual_mapped_addr(%rip)
-
- /* Absolute value for PIE support */
-virtual_mapped_addr:
- .quad virtual_mapped
+ movabsq $virtual_mapped, %rax
+ pushq %rax
+ ret
virtual_mapped:
movq RSP(%r8), %rsp
--
2.17.0.441.gb46fe60e1d-goog
^ permalink raw reply related
* [PATCH v3 07/27] x86: pm-trace - Adapt assembly for PIE support
From: Thomas Garnier via Virtualization @ 2018-05-23 19:54 UTC (permalink / raw)
To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf,
Greg Kroah-Hartman, Thomas Garnier, Philippe Ombredanne,
Kate Stewart, Arnaldo Carvalho de Melo, Yonghong Song,
Andrey Ryabinin, Kees Cook, Tom Lendacky, Kirill A . Shutemov,
Andy Lutomirski, Dominik Brodowski, Borislav Petkov,
Borislav Petkov, Rafael J . Wysocki
Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
virtualization, linux-sparse, linux-crypto, kernel-hardening,
xen-devel
In-Reply-To: <20180523195421.180248-1-thgarnie@google.com>
Change assembly to use the new _ASM_MOVABS macro instead of _ASM_MOV for
the assembly to be PIE compatible.
Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.
Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
arch/x86/include/asm/pm-trace.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/pm-trace.h b/arch/x86/include/asm/pm-trace.h
index bfa32aa428e5..972070806ce9 100644
--- a/arch/x86/include/asm/pm-trace.h
+++ b/arch/x86/include/asm/pm-trace.h
@@ -8,7 +8,7 @@
do { \
if (pm_trace_enabled) { \
const void *tracedata; \
- asm volatile(_ASM_MOV " $1f,%0\n" \
+ asm volatile(_ASM_MOVABS " $1f,%0\n" \
".section .tracedata,\"a\"\n" \
"1:\t.word %c1\n\t" \
_ASM_PTR " %c2\n" \
--
2.17.0.441.gb46fe60e1d-goog
^ permalink raw reply related
* [PATCH v3 08/27] x86/CPU: Adapt assembly for PIE support
From: Thomas Garnier via Virtualization @ 2018-05-23 19:54 UTC (permalink / raw)
To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf,
Greg Kroah-Hartman, Thomas Garnier, Philippe Ombredanne,
Kate Stewart, Arnaldo Carvalho de Melo, Yonghong Song,
Andrey Ryabinin, Kees Cook, Tom Lendacky, Kirill A . Shutemov,
Andy Lutomirski, Dominik Brodowski, Borislav Petkov,
Borislav Petkov, Rafael J . Wysocki
Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
virtualization, linux-sparse, linux-crypto, kernel-hardening,
xen-devel
In-Reply-To: <20180523195421.180248-1-thgarnie@google.com>
Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible. Use the new _ASM_MOVABS macro instead of
the 'mov $symbol, %dst' construct.
Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.
Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
arch/x86/include/asm/processor.h | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index c119d423eacb..81ae6877df29 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -50,7 +50,7 @@ static inline void *current_text_addr(void)
{
void *pc;
- asm volatile("mov $1f, %0; 1:":"=r" (pc));
+ asm volatile(_ASM_MOVABS " $1f, %0; 1:":"=r" (pc));
return pc;
}
@@ -718,6 +718,7 @@ static inline void sync_core(void)
: ASM_CALL_CONSTRAINT : : "memory");
#else
unsigned int tmp;
+ unsigned long tmp2;
asm volatile (
UNWIND_HINT_SAVE
@@ -728,11 +729,13 @@ static inline void sync_core(void)
"pushfq\n\t"
"mov %%cs, %0\n\t"
"pushq %q0\n\t"
- "pushq $1f\n\t"
+ "leaq 1f(%%rip), %1\n\t"
+ "pushq %1\n\t"
"iretq\n\t"
UNWIND_HINT_RESTORE
"1:"
- : "=&r" (tmp), ASM_CALL_CONSTRAINT : : "cc", "memory");
+ : "=&r" (tmp), "=&r" (tmp2), ASM_CALL_CONSTRAINT
+ : : "cc", "memory");
#endif
}
--
2.17.0.441.gb46fe60e1d-goog
^ permalink raw reply related
* [PATCH v3 09/27] x86/acpi: Adapt assembly for PIE support
From: Thomas Garnier via Virtualization @ 2018-05-23 19:54 UTC (permalink / raw)
To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf,
Greg Kroah-Hartman, Thomas Garnier, Philippe Ombredanne,
Kate Stewart, Arnaldo Carvalho de Melo, Yonghong Song,
Andrey Ryabinin, Kees Cook, Tom Lendacky, Kirill A . Shutemov,
Andy Lutomirski, Dominik Brodowski, Borislav Petkov,
Borislav Petkov, Rafael J . Wysocki
Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
virtualization, linux-sparse, linux-crypto, kernel-hardening,
xen-devel
In-Reply-To: <20180523195421.180248-1-thgarnie@google.com>
Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.
Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.
Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
arch/x86/kernel/acpi/wakeup_64.S | 31 ++++++++++++++++---------------
1 file changed, 16 insertions(+), 15 deletions(-)
diff --git a/arch/x86/kernel/acpi/wakeup_64.S b/arch/x86/kernel/acpi/wakeup_64.S
index 50b8ed0317a3..472659c0f811 100644
--- a/arch/x86/kernel/acpi/wakeup_64.S
+++ b/arch/x86/kernel/acpi/wakeup_64.S
@@ -14,7 +14,7 @@
* Hooray, we are in Long 64-bit mode (but still running in low memory)
*/
ENTRY(wakeup_long64)
- movq saved_magic, %rax
+ movq saved_magic(%rip), %rax
movq $0x123456789abcdef0, %rdx
cmpq %rdx, %rax
jne bogus_64_magic
@@ -25,14 +25,14 @@ ENTRY(wakeup_long64)
movw %ax, %es
movw %ax, %fs
movw %ax, %gs
- movq saved_rsp, %rsp
+ movq saved_rsp(%rip), %rsp
- movq saved_rbx, %rbx
- movq saved_rdi, %rdi
- movq saved_rsi, %rsi
- movq saved_rbp, %rbp
+ movq saved_rbx(%rip), %rbx
+ movq saved_rdi(%rip), %rdi
+ movq saved_rsi(%rip), %rsi
+ movq saved_rbp(%rip), %rbp
- movq saved_rip, %rax
+ movq saved_rip(%rip), %rax
jmp *%rax
ENDPROC(wakeup_long64)
@@ -45,7 +45,7 @@ ENTRY(do_suspend_lowlevel)
xorl %eax, %eax
call save_processor_state
- movq $saved_context, %rax
+ leaq saved_context(%rip), %rax
movq %rsp, pt_regs_sp(%rax)
movq %rbp, pt_regs_bp(%rax)
movq %rsi, pt_regs_si(%rax)
@@ -64,13 +64,14 @@ ENTRY(do_suspend_lowlevel)
pushfq
popq pt_regs_flags(%rax)
- movq $.Lresume_point, saved_rip(%rip)
+ leaq .Lresume_point(%rip), %rax
+ movq %rax, saved_rip(%rip)
- movq %rsp, saved_rsp
- movq %rbp, saved_rbp
- movq %rbx, saved_rbx
- movq %rdi, saved_rdi
- movq %rsi, saved_rsi
+ movq %rsp, saved_rsp(%rip)
+ movq %rbp, saved_rbp(%rip)
+ movq %rbx, saved_rbx(%rip)
+ movq %rdi, saved_rdi(%rip)
+ movq %rsi, saved_rsi(%rip)
addq $8, %rsp
movl $3, %edi
@@ -82,7 +83,7 @@ ENTRY(do_suspend_lowlevel)
.align 4
.Lresume_point:
/* We don't restore %rax, it must be 0 anyway */
- movq $saved_context, %rax
+ leaq saved_context(%rip), %rax
movq saved_context_cr4(%rax), %rbx
movq %rbx, %cr4
movq saved_context_cr3(%rax), %rbx
--
2.17.0.441.gb46fe60e1d-goog
^ permalink raw reply related
* [PATCH v3 10/27] x86/boot/64: Adapt assembly for PIE support
From: Thomas Garnier via Virtualization @ 2018-05-23 19:54 UTC (permalink / raw)
To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf,
Greg Kroah-Hartman, Thomas Garnier, Philippe Ombredanne,
Kate Stewart, Arnaldo Carvalho de Melo, Yonghong Song,
Andrey Ryabinin, Kees Cook, Tom Lendacky, Kirill A . Shutemov,
Andy Lutomirski, Dominik Brodowski, Borislav Petkov,
Borislav Petkov, Rafael J . Wysocki
Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
virtualization, linux-sparse, linux-crypto, kernel-hardening,
xen-devel
In-Reply-To: <20180523195421.180248-1-thgarnie@google.com>
Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.
Early at boot, the kernel is mapped at a temporary address while preparing
the page table. To know the changes needed for the page table with KASLR,
the boot code calculate the difference between the expected address of the
kernel and the one chosen by KASLR. It does not work with PIE because all
symbols in code are relatives. Instead of getting the future relocated
virtual address, you will get the current temporary mapping. The solution
is using global variables that will be relocated as expected.
Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.
Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
arch/x86/kernel/head_64.S | 26 ++++++++++++++++++++------
1 file changed, 20 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 8344dd2f310a..7c8f7ce93b9e 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -89,8 +89,9 @@ startup_64:
popq %rsi
/* Form the CR3 value being sure to include the CR3 modifier */
- addq $(early_top_pgt - __START_KERNEL_map), %rax
+ addq _early_top_pgt_offset(%rip), %rax
jmp 1f
+
ENTRY(secondary_startup_64)
UNWIND_HINT_EMPTY
/*
@@ -119,7 +120,7 @@ ENTRY(secondary_startup_64)
popq %rsi
/* Form the CR3 value being sure to include the CR3 modifier */
- addq $(init_top_pgt - __START_KERNEL_map), %rax
+ addq _init_top_offset(%rip), %rax
1:
/* Enable PAE mode, PGE and LA57 */
@@ -137,7 +138,7 @@ ENTRY(secondary_startup_64)
movq %rax, %cr3
/* Ensure I am executing from virtual addresses */
- movq $1f, %rax
+ movabs $1f, %rax
ANNOTATE_RETPOLINE_SAFE
jmp *%rax
1:
@@ -234,11 +235,12 @@ ENTRY(secondary_startup_64)
* REX.W + FF /5 JMP m16:64 Jump far, absolute indirect,
* address given in m16:64.
*/
- pushq $.Lafter_lret # put return address on stack for unwinder
+ leaq .Lafter_lret(%rip), %rax
+ pushq %rax # put return address on stack for unwinder
xorq %rbp, %rbp # clear frame pointer
- movq initial_code(%rip), %rax
+ leaq initial_code(%rip), %rax
pushq $__KERNEL_CS # set correct cs
- pushq %rax # target address in negative space
+ pushq (%rax) # target address in negative space
lretq
.Lafter_lret:
END(secondary_startup_64)
@@ -342,6 +344,18 @@ END(early_idt_handler_common)
GLOBAL(early_recursion_flag)
.long 0
+ /*
+ * Position Independent Code takes only relative references in code
+ * meaning a global variable address is relative to RIP and not its
+ * future virtual address. Global variables can be used instead as they
+ * are still relocated on the expected kernel mapping address.
+ */
+ .align 8
+_early_top_pgt_offset:
+ .quad early_top_pgt - __START_KERNEL_map
+_init_top_offset:
+ .quad init_top_pgt - __START_KERNEL_map
+
#define NEXT_PAGE(name) \
.balign PAGE_SIZE; \
GLOBAL(name)
--
2.17.0.441.gb46fe60e1d-goog
^ permalink raw reply related
* [PATCH v3 11/27] x86/power/64: Adapt assembly for PIE support
From: Thomas Garnier via Virtualization @ 2018-05-23 19:54 UTC (permalink / raw)
To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf,
Greg Kroah-Hartman, Thomas Garnier, Philippe Ombredanne,
Kate Stewart, Arnaldo Carvalho de Melo, Yonghong Song,
Andrey Ryabinin, Kees Cook, Tom Lendacky, Kirill A . Shutemov,
Andy Lutomirski, Dominik Brodowski, Borislav Petkov,
Borislav Petkov, Rafael J . Wysocki
Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
virtualization, linux-sparse, linux-crypto, kernel-hardening,
xen-devel
In-Reply-To: <20180523195421.180248-1-thgarnie@google.com>
Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.
Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.
Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
arch/x86/power/hibernate_asm_64.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/power/hibernate_asm_64.S b/arch/x86/power/hibernate_asm_64.S
index ce8da3a0412c..6fdd7bbc3c33 100644
--- a/arch/x86/power/hibernate_asm_64.S
+++ b/arch/x86/power/hibernate_asm_64.S
@@ -24,7 +24,7 @@
#include <asm/frame.h>
ENTRY(swsusp_arch_suspend)
- movq $saved_context, %rax
+ leaq saved_context(%rip), %rax
movq %rsp, pt_regs_sp(%rax)
movq %rbp, pt_regs_bp(%rax)
movq %rsi, pt_regs_si(%rax)
@@ -115,7 +115,7 @@ ENTRY(restore_registers)
movq %rax, %cr4; # turn PGE back on
/* We don't restore %rax, it must be 0 anyway */
- movq $saved_context, %rax
+ leaq saved_context(%rip), %rax
movq pt_regs_sp(%rax), %rsp
movq pt_regs_bp(%rax), %rbp
movq pt_regs_si(%rax), %rsi
--
2.17.0.441.gb46fe60e1d-goog
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox