* [PULL 00/18] Block patches
From: Max Reitz @ 2020-02-20 16:06 UTC (permalink / raw)
To: qemu-block; +Cc: Kevin Wolf, Peter Maydell, qemu-devel, Max Reitz
The following changes since commit 672f9d0df10a68a5c5f2b32cbc8284abf9f5ee18:
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging (2020-02-18 14:23:43 +0000)
are available in the Git repository at:
https://github.com/XanClic/qemu.git tags/pull-block-2020-02-20
for you to fetch changes up to dff8d44c96f128480430b0c59ed8760917dbd427:
iotests: Test snapshot -l field separation (2020-02-20 16:43:42 +0100)
----------------------------------------------------------------
Block patches:
- qemu-img convert: New --target-is-zero parameter
- qcow2: Specify non-default compression type flag
- optionally flat output for query-named-block-nodes
- some fixes
- pseudo-creation of images on block devices is now done by a generic
block layer function
----------------------------------------------------------------
Daniel P. Berrangé (1):
block: always fill entire LUKS header space with zeros
David Edmondson (1):
qemu-img: Add --target-is-zero to convert
Max Reitz (11):
iotests/147: Fix drive parameters
iotests/279: Fix for non-qcow2 formats
block/nbd: Fix hang in .bdrv_close()
block: Generic file creation fallback
file-posix: Drop hdev_co_create_opts()
iscsi: Drop iscsi_co_create_opts()
iotests: Add test for image creation fallback
qemu-img: Fix convert -n -B for backing-less targets
iotests: Test convert -n -B to backing-less target
block: Fix VM size field width in snapshot dump
iotests: Test snapshot -l field separation
Peter Krempa (1):
qapi: Allow getting flat output from 'query-named-block-nodes'
Thomas Huth (1):
iotests: Remove the superfluous 2nd check for the availability of
quorum
Vladimir Sementsov-Ogievskiy (3):
docs: improve qcow2 spec about extending image header
docs: qcow2: introduce compression type feature
block/backup-top: fix flags handling
block.c | 164 +++++++++++++++++++++++++++++++++----
block/backup-top.c | 31 ++++---
block/file-posix.c | 67 ---------------
block/iscsi.c | 56 -------------
block/nbd.c | 14 +++-
block/qapi.c | 15 +++-
block/qcow2.c | 11 ++-
blockdev.c | 8 +-
docs/interop/qcow2.txt | 64 ++++++++++++++-
docs/interop/qemu-img.rst | 9 +-
include/block/block.h | 2 +-
include/block/qapi.h | 4 +-
monitor/hmp-cmds.c | 2 +-
qapi/block-core.json | 7 +-
qemu-img-cmds.hx | 4 +-
qemu-img.c | 28 ++++++-
tests/qemu-iotests/122 | 14 ++++
tests/qemu-iotests/122.out | 5 ++
tests/qemu-iotests/139 | 3 -
tests/qemu-iotests/147 | 2 +-
tests/qemu-iotests/259 | 62 ++++++++++++++
tests/qemu-iotests/259.out | 14 ++++
tests/qemu-iotests/279 | 7 +-
tests/qemu-iotests/284 | 97 ++++++++++++++++++++++
tests/qemu-iotests/284.out | 62 ++++++++++++++
tests/qemu-iotests/286 | 76 +++++++++++++++++
tests/qemu-iotests/286.out | 8 ++
tests/qemu-iotests/group | 3 +
28 files changed, 659 insertions(+), 180 deletions(-)
create mode 100755 tests/qemu-iotests/259
create mode 100644 tests/qemu-iotests/259.out
create mode 100755 tests/qemu-iotests/284
create mode 100644 tests/qemu-iotests/284.out
create mode 100755 tests/qemu-iotests/286
create mode 100644 tests/qemu-iotests/286.out
--
2.24.1
^ permalink raw reply
* [PATCH 1/2] mm: move force_dma_unencrypted() to mem_encrypt.h
From: Halil Pasic @ 2020-02-20 16:06 UTC (permalink / raw)
To: Michael S. Tsirkin, Jason Wang, Marek Szyprowski, Robin Murphy,
Christoph Hellwig
Cc: linux-s390, Janosch Frank, Lendacky, Thomas, Cornelia Huck,
Ram Pai, linux-kernel, virtualization, Halil Pasic,
Christian Borntraeger, iommu, Michael Mueller, Viktor Mihajlovski,
David Gibson
In-Reply-To: <20200220160606.53156-1-pasic@linux.ibm.com>
Currently force_dma_unencrypted() is only used by the direct
implementation of the DMA API, and thus resides in dma-direct.h. But
there is nothing dma-direct specific about it: if one was -- for
whatever reason -- to implement custom DMA ops that have to in the
encrypted/protected scenarios dma-direct currently deals with, one would
need exactly this kind of information.
More importantly, virtio has to use DMA API (so that the memory
encryption (x86) or protection (power, s390) is handled) under the very
same circumstances force_dma_unencrypted() returns true. Furthermore,
the in these cases the reason to go via the DMA API is distinct,
compared to the reason indicated by VIRTIO_F_IOMMU_PLATFORM: we need to
use DMA API independently of the device's properties with regards to
access to memory. I.e. the addresses in the descriptors are still guest
physical addresses, the device may still be implemented by a SMP CPU,
and thus the device shall use those without any further translation. See
[1].
Let's move force_dma_unencrypted() the so virtio, or other
implementations of DMA ops can make the right decisions.
[1] https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-4100006
(In the spec VIRTIO_F_IOMMU_PLATFORM is called
VIRTIO_F_ACCESS_PLATFORM).
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Tested-by: Ram Pai <linuxram@us.ibm.com>
Tested-by: Michael Mueller <mimu@linux.ibm.com>
---
include/linux/dma-direct.h | 9 ---------
include/linux/mem_encrypt.h | 10 ++++++++++
2 files changed, 10 insertions(+), 9 deletions(-)
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 24b8684aa21d..590b15d881b0 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -26,15 +26,6 @@ static inline phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t dev_addr)
}
#endif /* !CONFIG_ARCH_HAS_PHYS_TO_DMA */
-#ifdef CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED
-bool force_dma_unencrypted(struct device *dev);
-#else
-static inline bool force_dma_unencrypted(struct device *dev)
-{
- return false;
-}
-#endif /* CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED */
-
/*
* If memory encryption is supported, phys_to_dma will set the memory encryption
* bit in the DMA address, and dma_to_phys will clear it. The raw __phys_to_dma
diff --git a/include/linux/mem_encrypt.h b/include/linux/mem_encrypt.h
index 5c4a18a91f89..64a48c4b01a2 100644
--- a/include/linux/mem_encrypt.h
+++ b/include/linux/mem_encrypt.h
@@ -22,6 +22,16 @@ static inline bool mem_encrypt_active(void) { return false; }
#endif /* CONFIG_ARCH_HAS_MEM_ENCRYPT */
+struct device;
+#ifdef CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED
+bool force_dma_unencrypted(struct device *dev);
+#else
+static inline bool force_dma_unencrypted(struct device *dev)
+{
+ return false;
+}
+#endif /* CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED */
+
#ifdef CONFIG_AMD_MEM_ENCRYPT
/*
* The __sme_set() and __sme_clr() macros are useful for adding or removing
--
2.17.1
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply related
* [PATCH 0/2] virtio: decouple protected guest RAM form VIRTIO_F_IOMMU_PLATFORM
From: Halil Pasic @ 2020-02-20 16:06 UTC (permalink / raw)
To: Michael S. Tsirkin, Jason Wang, Marek Szyprowski, Robin Murphy,
Christoph Hellwig
Cc: linux-s390, Janosch Frank, Lendacky, Thomas, Cornelia Huck,
Ram Pai, linux-kernel, virtualization, Halil Pasic,
Christian Borntraeger, iommu, Michael Mueller, Viktor Mihajlovski,
David Gibson
Currently if one intends to run a memory protection enabled VM with
virtio devices and linux as the guest OS, one needs to specify the
VIRTIO_F_IOMMU_PLATFORM flag for each virtio device to make the guest
linux use the DMA API, which in turn handles the memory
encryption/protection stuff if the guest decides to turn itself into
a protected one. This however makes no sense due to multiple reasons:
* The device is not changed by the fact that the guest RAM is
protected. The so called IOMMU bypass quirk is not affected.
* This usage is not congruent with standardised semantics of
VIRTIO_F_IOMMU_PLATFORM. Guest memory protected is an orthogonal reason
for using DMA API in virtio (orthogonal with respect to what is
expressed by VIRTIO_F_IOMMU_PLATFORM).
This series aims to decouple 'have to use DMA API because my (guest) RAM
is protected' and 'have to use DMA API because the device told me
VIRTIO_F_IOMMU_PLATFORM'.
Please find more detailed explanations about the conceptual aspects in
the individual patches. There is however also a very practical problem
that is addressed by this series.
For vhost-net the feature VIRTIO_F_IOMMU_PLATFORM has the following side
effect The vhost code assumes it the addresses on the virtio descriptor
ring are not guest physical addresses but iova's, and insists on doing a
translation of these regardless of what transport is used (e.g. whether
we emulate a PCI or a CCW device). (For details see commit 6b1e6cc7855b
"vhost: new device IOTLB API".) On s390 this results in severe
performance degradation (c.a. factor 10). BTW with ccw I/O there is
(architecturally) no IOMMU, so the whole address translation makes no
sense in the context of virtio-ccw.
Halil Pasic (2):
mm: move force_dma_unencrypted() to mem_encrypt.h
virtio: let virtio use DMA API when guest RAM is protected
drivers/virtio/virtio_ring.c | 3 +++
include/linux/dma-direct.h | 9 ---------
include/linux/mem_encrypt.h | 10 ++++++++++
3 files changed, 13 insertions(+), 9 deletions(-)
base-commit: ca7e1fd1026c5af6a533b4b5447e1d2f153e28f2
--
2.17.1
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply
* [PATCH 0/2] virtio: decouple protected guest RAM form VIRTIO_F_IOMMU_PLATFORM
From: Halil Pasic @ 2020-02-20 16:06 UTC (permalink / raw)
To: Michael S. Tsirkin, Jason Wang, Marek Szyprowski, Robin Murphy,
Christoph Hellwig
Cc: Halil Pasic, linux-s390, virtualization, linux-kernel, iommu,
Christian Borntraeger, Janosch Frank, Viktor Mihajlovski,
Cornelia Huck, Ram Pai, Thiago Jung Bauermann, David Gibson,
Lendacky, Thomas, Michael Mueller
Currently if one intends to run a memory protection enabled VM with
virtio devices and linux as the guest OS, one needs to specify the
VIRTIO_F_IOMMU_PLATFORM flag for each virtio device to make the guest
linux use the DMA API, which in turn handles the memory
encryption/protection stuff if the guest decides to turn itself into
a protected one. This however makes no sense due to multiple reasons:
* The device is not changed by the fact that the guest RAM is
protected. The so called IOMMU bypass quirk is not affected.
* This usage is not congruent with standardised semantics of
VIRTIO_F_IOMMU_PLATFORM. Guest memory protected is an orthogonal reason
for using DMA API in virtio (orthogonal with respect to what is
expressed by VIRTIO_F_IOMMU_PLATFORM).
This series aims to decouple 'have to use DMA API because my (guest) RAM
is protected' and 'have to use DMA API because the device told me
VIRTIO_F_IOMMU_PLATFORM'.
Please find more detailed explanations about the conceptual aspects in
the individual patches. There is however also a very practical problem
that is addressed by this series.
For vhost-net the feature VIRTIO_F_IOMMU_PLATFORM has the following side
effect The vhost code assumes it the addresses on the virtio descriptor
ring are not guest physical addresses but iova's, and insists on doing a
translation of these regardless of what transport is used (e.g. whether
we emulate a PCI or a CCW device). (For details see commit 6b1e6cc7855b
"vhost: new device IOTLB API".) On s390 this results in severe
performance degradation (c.a. factor 10). BTW with ccw I/O there is
(architecturally) no IOMMU, so the whole address translation makes no
sense in the context of virtio-ccw.
Halil Pasic (2):
mm: move force_dma_unencrypted() to mem_encrypt.h
virtio: let virtio use DMA API when guest RAM is protected
drivers/virtio/virtio_ring.c | 3 +++
include/linux/dma-direct.h | 9 ---------
include/linux/mem_encrypt.h | 10 ++++++++++
3 files changed, 13 insertions(+), 9 deletions(-)
base-commit: ca7e1fd1026c5af6a533b4b5447e1d2f153e28f2
--
2.17.1
^ permalink raw reply
* [PATCH 1/2] mm: move force_dma_unencrypted() to mem_encrypt.h
From: Halil Pasic @ 2020-02-20 16:06 UTC (permalink / raw)
To: Michael S. Tsirkin, Jason Wang, Marek Szyprowski, Robin Murphy,
Christoph Hellwig
Cc: Halil Pasic, linux-s390, virtualization, linux-kernel, iommu,
Christian Borntraeger, Janosch Frank, Viktor Mihajlovski,
Cornelia Huck, Ram Pai, Thiago Jung Bauermann, David Gibson,
Lendacky, Thomas, Michael Mueller
In-Reply-To: <20200220160606.53156-1-pasic@linux.ibm.com>
Currently force_dma_unencrypted() is only used by the direct
implementation of the DMA API, and thus resides in dma-direct.h. But
there is nothing dma-direct specific about it: if one was -- for
whatever reason -- to implement custom DMA ops that have to in the
encrypted/protected scenarios dma-direct currently deals with, one would
need exactly this kind of information.
More importantly, virtio has to use DMA API (so that the memory
encryption (x86) or protection (power, s390) is handled) under the very
same circumstances force_dma_unencrypted() returns true. Furthermore,
the in these cases the reason to go via the DMA API is distinct,
compared to the reason indicated by VIRTIO_F_IOMMU_PLATFORM: we need to
use DMA API independently of the device's properties with regards to
access to memory. I.e. the addresses in the descriptors are still guest
physical addresses, the device may still be implemented by a SMP CPU,
and thus the device shall use those without any further translation. See
[1].
Let's move force_dma_unencrypted() the so virtio, or other
implementations of DMA ops can make the right decisions.
[1] https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-4100006
(In the spec VIRTIO_F_IOMMU_PLATFORM is called
VIRTIO_F_ACCESS_PLATFORM).
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Tested-by: Ram Pai <linuxram@us.ibm.com>
Tested-by: Michael Mueller <mimu@linux.ibm.com>
---
include/linux/dma-direct.h | 9 ---------
include/linux/mem_encrypt.h | 10 ++++++++++
2 files changed, 10 insertions(+), 9 deletions(-)
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 24b8684aa21d..590b15d881b0 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -26,15 +26,6 @@ static inline phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t dev_addr)
}
#endif /* !CONFIG_ARCH_HAS_PHYS_TO_DMA */
-#ifdef CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED
-bool force_dma_unencrypted(struct device *dev);
-#else
-static inline bool force_dma_unencrypted(struct device *dev)
-{
- return false;
-}
-#endif /* CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED */
-
/*
* If memory encryption is supported, phys_to_dma will set the memory encryption
* bit in the DMA address, and dma_to_phys will clear it. The raw __phys_to_dma
diff --git a/include/linux/mem_encrypt.h b/include/linux/mem_encrypt.h
index 5c4a18a91f89..64a48c4b01a2 100644
--- a/include/linux/mem_encrypt.h
+++ b/include/linux/mem_encrypt.h
@@ -22,6 +22,16 @@ static inline bool mem_encrypt_active(void) { return false; }
#endif /* CONFIG_ARCH_HAS_MEM_ENCRYPT */
+struct device;
+#ifdef CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED
+bool force_dma_unencrypted(struct device *dev);
+#else
+static inline bool force_dma_unencrypted(struct device *dev)
+{
+ return false;
+}
+#endif /* CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED */
+
#ifdef CONFIG_AMD_MEM_ENCRYPT
/*
* The __sme_set() and __sme_clr() macros are useful for adding or removing
--
2.17.1
^ permalink raw reply related
* Re: Bug: Git: Clone: University Network: No Output on Terminal
From: René Scharfe @ 2020-02-20 16:07 UTC (permalink / raw)
To: Manish Devgan; +Cc: git, brian m. carlson, Junio C Hamano
In-Reply-To: <CABVXwf6mPXWof3RHshDo=FSX8dvqUYDFH6-ecaCXDi4KhY8QJQ@mail.gmail.com>
Am 20.02.20 um 11:20 schrieb Manish Devgan:
> This is great. But perhaps I would want the output to come
> irrespective of --progress argument in case of dumb_http.
It's on by default if you use git on a terminal (as opposed to calling
it from a script).
René
^ permalink raw reply
* Re: [PATCH v2 10/17] iio: cros_ec: Use cros_ec_cmd()
From: Enric Balletbo i Serra @ 2020-02-20 16:07 UTC (permalink / raw)
To: Prashant Malani
Cc: Gwendal Grignou, Jonathan Cameron, Linux Kernel Mailing List,
Hartmut Knaack, Lars-Peter Clausen, Peter Meerwald-Stadler,
Benson Leung, Guenter Roeck, Lee Jones, Fabien Lahoudere,
open list:IIO SUBSYSTEM AND DRIVERS
In-Reply-To: <20200218183004.GA184561@google.com>
Hi Prashant,
Could you base your next series on top of [1]. Also, if you can give your
feedback and test those, would be much appreciated ;-)
BTW, I think you need to fix your sendmail as the series are not threaded and
appear as independent patches in patchwork, which is a bit hard to follow.
Thanks,
Enric
[1] https://lore.kernel.org/patchwork/cover/1197210/
On 18/2/20 19:30, Prashant Malani wrote:
> Hi All,
>
> Just thought I'd ping this thread since it's been a week since the last
> email.
>
> On Mon, Feb 10, 2020 at 12:14:01PM -0800, Prashant Malani wrote:
>> Hi All (trimming most code parts of the thread for the sake of brevity),
>>
>> Thanks for listing the points Enric, Please see my notes inline:
>>
>> On Mon, Feb 10, 2020 at 3:03 AM Enric Balletbo i Serra
>> <enric.balletbo@collabora.com> wrote:
>>>
>>> Hi Gwendal, Prashant et all
>>>
>>> On 7/2/20 19:47, Gwendal Grignou wrote:
>>>> On Thu, Feb 6, 2020 at 10:50 AM Prashant Malani <pmalani@chromium.org> wrote:
>>>>>
>>>>> Hi Enric,
>>>>>
>>>>> Thanks for taking a look at the patch. Please see my response inline:
>> ....
>>>>>>>> @@ -171,9 +162,11 @@ int cros_ec_motion_send_host_cmd(struct cros_ec_sensors_core_state *state,
>>>>>>>>
>>>>>>>> memcpy(state->msg->data, &state->param, sizeof(state->param));
>>>>>>>>
>>>>>>>> - ret = cros_ec_cmd_xfer_status(state->ec, state->msg);
>>>>>>>> + ret = cros_ec_cmd_xfer(state->ec, state->msg);
>>>>>>>> if (ret < 0)
>>>>>>>> return ret;
>>>>>>>> + else if (state->msg->result != EC_RES_SUCCESS)
>>>>>>>> + return -EPROTO;
>>>>>>>>
>>>>>>
>>>>>> There is no way to use the new cros_ec_cmd here?
>>>> When the EC does not support sensor fifo,
>>>> cros_ec_motion_send_host_cmd() is on the data path. For instance, it
>>>> is called 2 times every 10ms by chrome to calculate the lid angle. I
>>>> would be reluctant to call malloc. Given it is well encapsulated into
>>>> the sensor stack. Does it make sense to call cros_ec_cmd_xfer
>>>> directly?
>>>>
>>>
>>> Thanks Gwendal for pointing this, it makes totally sense, and I suspect this can
>>> happen on other cases.
>>>
>>> Just to make clear, my concern is not about not using the new 'cros_ec_cmd'
>>> here, is about changing 'cros_ec_cmd_xfer_status' for 'cros_ec_cmd_xfer'. Also,
>>> my other concern is how useful is the new 'cros_ec_cmd' replacing what we have
>>> now if cannot replace all current uses.
>>>
>>> My points of view are this:
>>>
>>> * Actually we have cros_ec_cmd_xfer and cros_ec_cmd_xfer_status, use the second
>>> one is better, in fact, we tried to move all the cros_ec_cmd_xfer to the _status
>>> version in the past because makes the code and error handling cleaner. So I'm
>>> reticent to get back to use cros_ec_cmd_xfer instead of cros_ec_cmd_xfer_status.
>>>
>>> * The users of the cros-ec protocol sometimes they mallocing/freeing at runtime,
>>> and sometimes they don't. IMHO *non* mallocing/freeing is usually better, more
>>> efficient and faster. Would be nice to standardize this.
>>
>> I think we should look at latency (I am assuming that is one of the
>> concerns Gwendal was referring to).
>> We should certainly do more rigorous measurements, but I did a crude
>> measurement across a devm_kzalloc() used on one of the EC commands
>> inside platform/chrome for struct EC command:
>> - Used ktime_get_ns() to record time before and after the devm_kzalloc()
>> - Used ktime_sub to subtract the "after" and "before" values:
>>
>> struct cros_ec_command *msg;
>> int ret;
>> + ktime_t start, end, diff;
>>
>> + start = ktime_get_ns();
>> msg = kzalloc(sizeof(*msg) + max(outsize, insize), GFP_KERNEL);
>> + end = ktime_get_ns();
>> if (!msg)
>> return -ENOMEM;
>>
>> + diff = ktime_sub(end, start);
>> + printk("%s(): TEST: kzalloc took: %lld\n", __func__, ktime_to_ns(diff));
>>
>> On an i5 1.6 GHz system, across 16 call measurements I got the
>> following latency values (in ns):
>> - Count, N:16
>> - Average: 72.375
>> - Std. Dev : 28.768
>> - Max: 143
>> - Min: 51
>>
>> Are these values significant for the various call-sites? I think the
>> driver authors might be able to comment better there (unfortunately I
>> don't have enough context for each case).
>> Of course there will be other overhead (memcpy) but I think this is a
>> good starting point for the discussion.
>> (My apologies if this measurement method is incorrect/inaccurate.)
>
> Any thoughts / comments here?
>
> On an overall note, I think keeping cros_ec_cmd_xfer() and cros_ec_cmd()
> might be a good starting point.
>
> In this way, we are not introducing any extra function. Also, we can
> begin converting the cros_ec_cmd_xfer() use cases (a few call-sites may
> need to be investigated from a latency perspective). The
> cros_ec_cmd_xfer() conversions are better handled in separate patch
> series.
>
> Best regards,
>
> -Prashant
>>
>>>
>>> * If we want to introduce a new 'cros_ec_cmd', this should make the code cleaner
>>> and ideally should be the way we tell the users they should use to communicate
>>> with the cros-ec and not open coding constantly. Ideally, should be a
>>> replacement of all current 'cros_ec_cmd_xfer*' versions.
>>
>> As I mentioned previously, I think all calls of cros_ec_cmd_xfer() can
>> be converted to use cros_ec_cmd() (especially since the new API has a
>> *result pointer),
>> but I think it should be staged out a bit more (since cases like iio:
>> cros_ec driver require non-trivial refactoring which I think is better
>> in a patch/series).
>>
>>>
>>> * If 'cros_ec_cmd' *cannot* replace all the cases, it should be clear to the
>>> user in which cases he should use this function and in which cases shouldn't use
>>> this function.
>>
>> This seems like a good compromise, but my expectation is that if there
>> is a "fast" and "slow" version of the same functionality, developers
>> would be inclined to use the "fast" version always?
>>
>>
>>> * Finally, what pointed Gwendal, what's the best approach to send commands to
>>> the EC by default, is better use dynamic memory? or is better use the stack? is
>>> it always safe use the stack? is always efficient use allocated memory?
>>>
>>> As you can see I have a lot of questions still around, but taking in
>>> consideration that this will be an important change I think that makes sense
>>> spend some time discussing it.
>>>
>>> What do you think?
>>>
>>> Enric
>>>
>>>
>>>> Gwendal.
>>>>>
>>>>> I think it is doable. From looking at the code I felt the factors we
>>>>> need to be careful about are:
>>>>> - The function cros_ec_motion_send_host_cmd() is called from a few
>>>>> other files, each of which set up the struct cros_ec_command
>>>>> differently (reference:
>>>>> https://elixir.bootlin.com/linux/latest/ident/cros_ec_motion_send_host_cmd)
>>>>> - It is not clear to me how readability will be affected by making the
>>>>> change to cros_ec_cmd().
>>>>>
>>>>> Due to the above two factors, but primarily because I wanted to avoid
>>>>> making such an involved large change in this 17 patch series, I
>>>>> reasoned it would be better to make the transition to cros_ec_cmd()
>>>>> for these files in a separate patch/series.
>>>>> My plan after this patch series is to work on this driver(perhaps we
>>>>> can eliminate cros_ec_motion_send_host_cmd() itself?), and then remove
>>>>> cros_ec_cmd_xfer() usage.
>>>>>
>>>>> WDYT?
>>>>>
>>>>> Best regards,
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>>> if (ret &&
>>>>>>>> state->resp != (struct ec_response_motion_sense *)state->msg->data)
>>>>>>>
^ permalink raw reply
* [PATCH 2/2] virtio: let virtio use DMA API when guest RAM is protected
From: Halil Pasic @ 2020-02-20 16:06 UTC (permalink / raw)
To: Michael S. Tsirkin, Jason Wang, Marek Szyprowski, Robin Murphy,
Christoph Hellwig
Cc: linux-s390, Janosch Frank, Lendacky, Thomas, Cornelia Huck,
Ram Pai, linux-kernel, virtualization, Halil Pasic,
Christian Borntraeger, iommu, Michael Mueller, Viktor Mihajlovski,
David Gibson
In-Reply-To: <20200220160606.53156-1-pasic@linux.ibm.com>
Currently the advanced guest memory protection technologies (AMD SEV,
powerpc secure guest technology and s390 Protected VMs) abuse the
VIRTIO_F_IOMMU_PLATFORM flag to make virtio core use the DMA API, which
is in turn necessary, to make IO work with guest memory protection.
But VIRTIO_F_IOMMU_PLATFORM a.k.a. VIRTIO_F_ACCESS_PLATFORM is really a
different beast: with virtio devices whose implementation runs on an SMP
CPU we are still fine with doing all the usual optimizations, it is just
that we need to make sure that the memory protection mechanism does not
get in the way. The VIRTIO_F_ACCESS_PLATFORM mandates more work on the
side of the guest (and possibly he host side as well) than we actually
need.
An additional benefit of teaching the guest to make the right decision
(and use DMA API) on it's own is: removing the need, to mandate special
VM configuration for guests that may run with protection. This is
especially interesting for s390 as VIRTIO_F_IOMMU_PLATFORM pushes all
the virtio control structures into the first 2G of guest memory:
something we don't necessarily want to do per-default.
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Tested-by: Ram Pai <linuxram@us.ibm.com>
Tested-by: Michael Mueller <mimu@linux.ibm.com>
---
drivers/virtio/virtio_ring.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 867c7ebd3f10..fafc8f924955 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -243,6 +243,9 @@ static bool vring_use_dma_api(struct virtio_device *vdev)
if (!virtio_has_iommu_quirk(vdev))
return true;
+ if (force_dma_unencrypted(&vdev->dev))
+ return true;
+
/* Otherwise, we are left to guess. */
/*
* In theory, it's possible to have a buggy QEMU-supposed
--
2.17.1
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply related
* [PATCH] iio: pressure: icp10100: add driver for InvenSense ICP-101xx
From: Jean-Baptiste Maneyrol @ 2020-02-20 16:05 UTC (permalink / raw)
To: jic23, linux-iio; +Cc: Jean-Baptiste Maneyrol
InvenSense ICP-101xx sensors are a family of barometric pressure
and temperature sensor.
These devices are i2c only and use a specific protocol of
commands/responses. Data transfer is secured by using crc8.
Driver provides processed pressure data and temperature raw data.
Signed-off-by: Jean-Baptiste Maneyrol <jmaneyrol@invensense.com>
---
drivers/iio/pressure/Kconfig | 11 +
drivers/iio/pressure/Makefile | 1 +
drivers/iio/pressure/icp10100.c | 667 ++++++++++++++++++++++++++++++++
3 files changed, 679 insertions(+)
create mode 100644 drivers/iio/pressure/icp10100.c
diff --git a/drivers/iio/pressure/Kconfig b/drivers/iio/pressure/Kconfig
index 9c2d9bf8f100..689b978db4f9 100644
--- a/drivers/iio/pressure/Kconfig
+++ b/drivers/iio/pressure/Kconfig
@@ -101,6 +101,17 @@ config HP03
To compile this driver as a module, choose M here: the module
will be called hp03.
+config ICP10100
+ tristate "InvenSense ICP-101xx pressure and temperature sensor"
+ depends on I2C
+ select CRC8
+ help
+ Say yes here to build support for InvenSense ICP-101xx barometric
+ pressure and temperature sensor.
+
+ To compile this driver as a module, choose M here: the module
+ will be called icp10100.
+
config MPL115
tristate
diff --git a/drivers/iio/pressure/Makefile b/drivers/iio/pressure/Makefile
index 5a79192d8cb5..083ae87ff48f 100644
--- a/drivers/iio/pressure/Makefile
+++ b/drivers/iio/pressure/Makefile
@@ -14,6 +14,7 @@ obj-$(CONFIG_DPS310) += dps310.o
obj-$(CONFIG_IIO_CROS_EC_BARO) += cros_ec_baro.o
obj-$(CONFIG_HID_SENSOR_PRESS) += hid-sensor-press.o
obj-$(CONFIG_HP03) += hp03.o
+obj-$(CONFIG_ICP10100) += icp10100.o
obj-$(CONFIG_MPL115) += mpl115.o
obj-$(CONFIG_MPL115_I2C) += mpl115_i2c.o
obj-$(CONFIG_MPL115_SPI) += mpl115_spi.o
diff --git a/drivers/iio/pressure/icp10100.c b/drivers/iio/pressure/icp10100.c
new file mode 100644
index 000000000000..78eb9f416157
--- /dev/null
+++ b/drivers/iio/pressure/icp10100.c
@@ -0,0 +1,667 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2020 InvenSense, Inc.
+ *
+ * Driver for InvenSense ICP-1010xx barometric pressure and temperature sensor.
+ *
+ * Datasheet:
+ * http://www.invensense.com/wp-content/uploads/2018/01/DS-000186-ICP-101xx-v1.2.pdf
+ */
+
+#include <linux/device.h>
+#include <linux/module.h>
+#include <linux/i2c.h>
+#include <linux/pm_runtime.h>
+#include <linux/crc8.h>
+#include <linux/mutex.h>
+#include <linux/delay.h>
+#include <linux/log2.h>
+#include <linux/math64.h>
+#include <linux/regulator/consumer.h>
+#include <linux/iio/iio.h>
+
+#define ICP10100_ID_REG_GET(_reg) ((_reg) & 0x003F)
+#define ICP10100_ID_REG 0x08
+#define ICP10100_RESPONSE_WORD_LENGTH 3
+#define ICP10100_CRC8_WORD_LENGTH 2
+#define ICP10100_CRC8_POLYNOMIAL 0x31
+#define ICP10100_CRC8_INIT 0xFF
+
+enum icp10100_chip {
+ ICP10100,
+};
+
+enum icp10100_mode {
+ ICP10100_MODE_LP, /* Low power mode: 1x sampling */
+ ICP10100_MODE_N, /* Normal mode: 2x sampling */
+ ICP10100_MODE_LN, /* Low noise mode: 4x sampling */
+ ICP10100_MODE_ULN, /* Ultra low noise mode: 8x sampling */
+ ICP10100_MODE_NB,
+};
+
+struct icp10100_state {
+ struct mutex lock;
+ struct i2c_client *client;
+ struct regulator *vdd;
+ enum icp10100_chip chip;
+ enum icp10100_mode mode;
+ int16_t cal[4];
+};
+
+struct icp10100_command {
+ __be16 cmd;
+ unsigned long wait_us;
+ unsigned long wait_max_us;
+ size_t response_word_nb;
+};
+
+static const struct icp10100_command icp10100_cmd_soft_reset = {
+ .cmd = cpu_to_be16(0x805D),
+ .wait_us = 170,
+ .wait_max_us = 200,
+ .response_word_nb = 0,
+};
+static const struct icp10100_command icp10100_cmd_read_id = {
+ .cmd = cpu_to_be16(0xEFC8),
+ .wait_us = 0,
+ .response_word_nb = 1,
+};
+static const struct icp10100_command icp10100_cmd_read_otp = {
+ .cmd = cpu_to_be16(0xC7F7),
+ .wait_us = 0,
+ .response_word_nb = 1,
+};
+static const struct icp10100_command icp10100_cmd_measure[] = {
+ [ICP10100_MODE_LP] = {
+ .cmd = cpu_to_be16(0x401A),
+ .wait_us = 1800,
+ .wait_max_us = 2000,
+ .response_word_nb = 3,
+ },
+ [ICP10100_MODE_N] = {
+ .cmd = cpu_to_be16(0x48A3),
+ .wait_us = 6300,
+ .wait_max_us = 6500,
+ .response_word_nb = 3,
+ },
+ [ICP10100_MODE_LN] = {
+ .cmd = cpu_to_be16(0x5059),
+ .wait_us = 23800,
+ .wait_max_us = 24000,
+ .response_word_nb = 3,
+ },
+ [ICP10100_MODE_ULN] = {
+ .cmd = cpu_to_be16(0x58E0),
+ .wait_us = 94500,
+ .wait_max_us = 94700,
+ .response_word_nb = 3,
+ },
+};
+
+static const uint8_t icp10100_switch_mode_otp[] =
+ {0xC5, 0x95, 0x00, 0x66, 0x9c};
+
+DECLARE_CRC8_TABLE(icp10100_crc8_table);
+
+static inline int icp10100_i2c_xfer(struct i2c_adapter *adap,
+ struct i2c_msg *msgs, int num)
+{
+ int ret;
+
+ ret = i2c_transfer(adap, msgs, num);
+ if (ret < 0)
+ return ret;
+
+ if (ret == num)
+ ret = 0;
+ else
+ ret = -EIO;
+
+ return ret;
+}
+
+static int icp10100_send_cmd(struct icp10100_state *st,
+ const struct icp10100_command *cmd,
+ __be16 *buf, size_t buf_len)
+{
+ size_t size = cmd->response_word_nb * ICP10100_RESPONSE_WORD_LENGTH;
+ uint8_t data[16];
+ uint8_t *ptr;
+ uint8_t *buf_ptr = (uint8_t *)buf;
+ struct i2c_msg msgs[2] = {
+ {
+ .addr = st->client->addr,
+ .flags = 0,
+ .len = 2,
+ .buf = (uint8_t *)&cmd->cmd,
+ }, {
+ .addr = st->client->addr,
+ .flags = I2C_M_RD,
+ .len = size,
+ .buf = data,
+ },
+ };
+ uint8_t crc;
+ unsigned int i;
+ int ret;
+
+ if (size > sizeof(data))
+ return -EINVAL;
+
+ if (cmd->response_word_nb > 0 &&
+ (buf == NULL || buf_len < (cmd->response_word_nb * 2)))
+ return -EINVAL;
+
+ dev_dbg(&st->client->dev, "sending cmd %#x\n", be16_to_cpu(cmd->cmd));
+
+ if (cmd->response_word_nb > 0 && cmd->wait_us == 0) {
+ /* direct command-response without waiting */
+ ret = icp10100_i2c_xfer(st->client->adapter, msgs,
+ ARRAY_SIZE(msgs));
+ if (ret)
+ return ret;
+ } else {
+ /* transfer command write */
+ ret = icp10100_i2c_xfer(st->client->adapter, &msgs[0], 1);
+ if (ret)
+ return ret;
+ if (cmd->wait_us > 0)
+ usleep_range(cmd->wait_us, cmd->wait_max_us);
+ /* transfer response read if needed */
+ if (cmd->response_word_nb > 0) {
+ ret = icp10100_i2c_xfer(st->client->adapter, &msgs[1], 1);
+ if (ret)
+ return ret;
+ } else {
+ return 0;
+ }
+ }
+
+ /* process read words with crc checking */
+ for (i = 0; i < cmd->response_word_nb; ++i) {
+ ptr = &data[i * ICP10100_RESPONSE_WORD_LENGTH];
+ crc = crc8(icp10100_crc8_table, ptr, ICP10100_CRC8_WORD_LENGTH,
+ ICP10100_CRC8_INIT);
+ if (crc != ptr[ICP10100_CRC8_WORD_LENGTH]) {
+ dev_err(&st->client->dev, "crc error recv=%#x calc=%#x\n",
+ ptr[ICP10100_CRC8_WORD_LENGTH], crc);
+ return -EIO;
+ }
+ *buf_ptr++ = ptr[0];
+ *buf_ptr++ = ptr[1];
+ }
+
+ return 0;
+}
+
+static int icp10100_read_cal_otp(struct icp10100_state *st)
+{
+ __be16 val;
+ int i;
+ int ret;
+
+ /* switch into OTP read mode */
+ ret = i2c_master_send(st->client, icp10100_switch_mode_otp,
+ ARRAY_SIZE(icp10100_switch_mode_otp));
+ if (ret < 0)
+ return ret;
+ if (ret != ARRAY_SIZE(icp10100_switch_mode_otp))
+ return -EIO;
+
+ /* read 4 calibration values */
+ for (i = 0; i < 4; ++i) {
+ ret = icp10100_send_cmd(st, &icp10100_cmd_read_otp,
+ &val, sizeof(val));
+ if (ret)
+ return ret;
+ st->cal[i] = be16_to_cpu(val);
+ dev_dbg(&st->client->dev, "cal[%d] = %d\n", i, st->cal[i]);
+ }
+
+ return 0;
+}
+
+static int icp10100_init_chip(struct icp10100_state *st)
+{
+ __be16 val;
+ uint16_t id;
+ int ret;
+
+ /* read and check id */
+ ret = icp10100_send_cmd(st, &icp10100_cmd_read_id, &val, sizeof(val));
+ if (ret)
+ return ret;
+ id = ICP10100_ID_REG_GET(be16_to_cpu(val));
+ if (id != ICP10100_ID_REG) {
+ dev_err(&st->client->dev, "invalid id %#x\n", id);
+ return -ENODEV;
+ }
+
+ /* read calibration data from OTP */
+ ret = icp10100_read_cal_otp(st);
+ if (ret)
+ return ret;
+
+ /* reset chip */
+ ret = icp10100_send_cmd(st, &icp10100_cmd_soft_reset, NULL, 0);
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+static int icp10100_get_measures(struct icp10100_state *st,
+ uint32_t *pressure, uint16_t *temperature)
+{
+ const struct icp10100_command *cmd;
+ __be16 measures[3];
+ int ret;
+
+ pm_runtime_get_sync(&st->client->dev);
+
+ mutex_lock(&st->lock);
+ cmd = &icp10100_cmd_measure[st->mode];
+ ret = icp10100_send_cmd(st, cmd, measures, sizeof(measures));
+ mutex_unlock(&st->lock);
+ if (ret)
+ goto error_measure;
+
+ *pressure = (be16_to_cpu(measures[0]) << 8) |
+ (be16_to_cpu(measures[1]) >> 8);
+ *temperature = be16_to_cpu(measures[2]);
+
+ pm_runtime_mark_last_busy(&st->client->dev);
+error_measure:
+ pm_runtime_put_autosuspend(&st->client->dev);
+ return ret;
+}
+
+static uint32_t icp10100_get_pressure(struct icp10100_state *st,
+ uint32_t raw_pressure, uint16_t raw_temp)
+{
+ static int32_t p_calib[] = {45000, 80000, 105000};
+ static int32_t lut_lower = 3670016;
+ static int32_t lut_upper = 12058624;
+ static int32_t inv_quadr_factor = 16777216;
+ static int32_t offset_factor = 2048;
+ int64_t val1, val2;
+ int32_t p_lut[3];
+ int32_t t, t_square;
+ int64_t a, b, c;
+ uint32_t pressure_mPa;
+
+ dev_dbg(&st->client->dev, "raw: pressure = %u, temp = %u\n",
+ raw_pressure, raw_temp);
+
+ /* compute p_lut values */
+ t = (int32_t)raw_temp - 32768;
+ t_square = t * t;
+ val1 = (int64_t)st->cal[0] * (int64_t)t_square;
+ p_lut[0] = lut_lower + (int32_t)div_s64(val1, inv_quadr_factor);
+ val1 = (int64_t)st->cal[1] * (int64_t)t_square;
+ p_lut[1] = offset_factor * st->cal[3] +
+ (int32_t)div_s64(val1, inv_quadr_factor);
+ val1 = (int64_t)st->cal[2] * (int64_t)t_square;
+ p_lut[2] = lut_upper + (int32_t)div_s64(val1, inv_quadr_factor);
+ dev_dbg(&st->client->dev, "p_lut = [%d, %d, %d]\n",
+ p_lut[0], p_lut[1], p_lut[2]);
+
+ /* compute a, b, c factors */
+ val1 = (int64_t)p_lut[0] * (int64_t)p_lut[1] *
+ (int64_t)(p_calib[0] - p_calib[1]) +
+ (int64_t)p_lut[1] * (int64_t)p_lut[2] *
+ (int64_t)(p_calib[1] - p_calib[2]) +
+ (int64_t)p_lut[2] * (int64_t)p_lut[0] *
+ (int64_t)(p_calib[2] - p_calib[0]);
+ val2 = (int64_t)p_lut[2] * (int64_t)(p_calib[0] - p_calib[1]) +
+ (int64_t)p_lut[0] * (int64_t)(p_calib[1] - p_calib[2]) +
+ (int64_t)p_lut[1] * (int64_t)(p_calib[2] - p_calib[0]);
+ c = div64_s64(val1, val2);
+ dev_dbg(&st->client->dev, "val1 = %lld, val2 = %lld, c = %lld\n",
+ val1, val2, c);
+ val1 = (int64_t)p_calib[0] * (int64_t)p_lut[0] -
+ (int64_t)p_calib[1] * (int64_t)p_lut[1] -
+ (int64_t)(p_calib[1] - p_calib[0]) * c;
+ val2 = (int64_t)p_lut[0] - (int64_t)p_lut[1];
+ a = div64_s64(val1, val2);
+ dev_dbg(&st->client->dev, "val1 = %lld, val2 = %lld, a = %lld\n",
+ val1, val2, a);
+ b = ((int64_t)p_calib[0] - a) * ((int64_t)p_lut[0] + c);
+ dev_dbg(&st->client->dev, "b = %lld\n", b);
+
+ /*
+ * pressure_Pa = a + (b / (c + raw_pressure))
+ * pressure_mPa = 1000 * pressure_Pa
+ */
+ pressure_mPa = 1000LL * a + div64_s64(1000LL * b, c + raw_pressure);
+
+ return pressure_mPa;
+}
+
+static int icp10100_read_raw(struct iio_dev *indio_dev,
+ struct iio_chan_spec const *chan,
+ int *val, int *val2, long mask)
+{
+ struct icp10100_state *st = iio_priv(indio_dev);
+ uint32_t raw_pressure;
+ uint16_t raw_temp;
+ uint32_t pressure_mPa;
+ int ret;
+
+ switch (mask) {
+ case IIO_CHAN_INFO_RAW:
+ case IIO_CHAN_INFO_PROCESSED:
+ ret = iio_device_claim_direct_mode(indio_dev);
+ if (ret)
+ return ret;
+ ret = icp10100_get_measures(st, &raw_pressure, &raw_temp);
+ if (ret)
+ goto out_info_proc;
+ switch (chan->type) {
+ case IIO_PRESSURE:
+ pressure_mPa = icp10100_get_pressure(st, raw_pressure,
+ raw_temp);
+ /* mPa to kPa */
+ *val = pressure_mPa / 1000000;
+ *val2 = pressure_mPa % 1000000;
+ ret = IIO_VAL_INT_PLUS_MICRO;
+ break;
+ case IIO_TEMP:
+ *val = raw_temp;
+ ret = IIO_VAL_INT;
+ break;
+ default:
+ ret = -EINVAL;
+ break;
+ };
+out_info_proc:
+ iio_device_release_direct_mode(indio_dev);
+ return ret;
+ case IIO_CHAN_INFO_SCALE:
+ switch (chan->type) {
+ case IIO_TEMP:
+ /* 1000 * 175°C / 65536 in m°C */
+ *val = 2;
+ *val2 = 670288;
+ return IIO_VAL_INT_PLUS_MICRO;
+ default:
+ return -EINVAL;
+ }
+ break;
+ case IIO_CHAN_INFO_OFFSET:
+ switch (chan->type) {
+ case IIO_TEMP:
+ /* 1000 * -45°C in m°C */
+ *val = -45000;
+ return IIO_VAL_INT;
+ default:
+ return -EINVAL;
+ }
+ break;
+ case IIO_CHAN_INFO_OVERSAMPLING_RATIO:
+ mutex_lock(&st->lock);
+ *val = 1 << st->mode;
+ mutex_unlock(&st->lock);
+ return IIO_VAL_INT;
+ default:
+ return -EINVAL;
+ }
+}
+
+static int icp10100_read_avail(struct iio_dev *indio_dev,
+ struct iio_chan_spec const *chan,
+ const int **vals, int *type, int *length,
+ long mask)
+{
+ static int oversamplings[] = {1, 2, 4, 8};
+
+ switch (mask) {
+ case IIO_CHAN_INFO_OVERSAMPLING_RATIO:
+ *vals = oversamplings;
+ *type = IIO_VAL_INT;
+ *length = ARRAY_SIZE(oversamplings);
+ return IIO_AVAIL_LIST;
+ default:
+ return -EINVAL;
+ }
+}
+
+static int icp10100_write_raw(struct iio_dev *indio_dev,
+ struct iio_chan_spec const *chan,
+ int val, int val2, long mask)
+{
+ struct icp10100_state *st = iio_priv(indio_dev);
+ unsigned int mode;
+ int ret;
+
+ switch (mask) {
+ case IIO_CHAN_INFO_OVERSAMPLING_RATIO:
+ /* oversampling is always positive and a power of 2 */
+ if (val <= 0 || !is_power_of_2(val))
+ return -EINVAL;
+ mode = ilog2(val);
+ if (mode >= ICP10100_MODE_NB)
+ return -EINVAL;
+ ret = iio_device_claim_direct_mode(indio_dev);
+ if (ret)
+ return ret;
+ mutex_lock(&st->lock);
+ st->mode = mode;
+ mutex_unlock(&st->lock);
+ iio_device_release_direct_mode(indio_dev);
+ return 0;
+ default:
+ return -EINVAL;
+ }
+}
+
+static int icp10100_write_raw_get_fmt(struct iio_dev *indio_dev,
+ struct iio_chan_spec const *chan,
+ long mask)
+{
+ switch (mask) {
+ case IIO_CHAN_INFO_OVERSAMPLING_RATIO:
+ return IIO_VAL_INT;
+ default:
+ return -EINVAL;
+ }
+}
+
+static const struct iio_info icp10100_info = {
+ .read_raw = icp10100_read_raw,
+ .read_avail = icp10100_read_avail,
+ .write_raw = icp10100_write_raw,
+ .write_raw_get_fmt = icp10100_write_raw_get_fmt,
+};
+
+static const struct iio_chan_spec icp10100_channels[] = {
+ {
+ .type = IIO_PRESSURE,
+ .info_mask_separate = BIT(IIO_CHAN_INFO_PROCESSED),
+ .info_mask_shared_by_all =
+ BIT(IIO_CHAN_INFO_OVERSAMPLING_RATIO),
+ .info_mask_shared_by_all_available =
+ BIT(IIO_CHAN_INFO_OVERSAMPLING_RATIO),
+ }, {
+ .type = IIO_TEMP,
+ .info_mask_separate = BIT(IIO_CHAN_INFO_RAW) |
+ BIT(IIO_CHAN_INFO_SCALE) |
+ BIT(IIO_CHAN_INFO_OFFSET),
+ .info_mask_shared_by_all =
+ BIT(IIO_CHAN_INFO_OVERSAMPLING_RATIO),
+ .info_mask_shared_by_all_available =
+ BIT(IIO_CHAN_INFO_OVERSAMPLING_RATIO),
+ },
+};
+
+static int icp10100_enable_regulator(struct icp10100_state *st)
+{
+ int ret;
+
+ ret = regulator_enable(st->vdd);
+ if (ret)
+ return ret;
+ msleep(100);
+
+ return 0;
+}
+
+static void icp10100_disable_regulator_action(void *data)
+{
+ struct icp10100_state *st = data;
+ int ret;
+
+ ret = regulator_disable(st->vdd);
+ if (ret)
+ dev_err(&st->client->dev, "error %d disabling vdd\n", ret);
+}
+
+static void icp10100_pm_disable(void *data)
+{
+ struct device *dev = data;
+
+ pm_runtime_put_sync_suspend(dev);
+ pm_runtime_disable(dev);
+}
+
+static int icp10100_probe(struct i2c_client *client,
+ const struct i2c_device_id *id)
+{
+ const void *match;
+ enum icp10100_chip chip;
+ const char *name;
+ struct iio_dev *indio_dev;
+ struct icp10100_state *st;
+ int ret;
+
+ if (!i2c_check_functionality(client->adapter, I2C_FUNC_I2C)) {
+ dev_err(&client->dev, "plain i2c transactions not supported\n");
+ return -ENODEV;
+ }
+
+ match = device_get_match_data(&client->dev);
+ if (match) {
+ chip = (enum icp10100_chip)match;
+ name = client->name;
+ } else if (id) {
+ chip = (enum icp10100_chip)id->driver_data;
+ name = id->name;
+ } else {
+ return -ENOSYS;
+ }
+
+ indio_dev = devm_iio_device_alloc(&client->dev, sizeof(*st));
+ if (!indio_dev)
+ return -ENOMEM;
+ i2c_set_clientdata(client, indio_dev);
+ indio_dev->dev.parent = &client->dev;
+ indio_dev->name = name;
+ indio_dev->modes = INDIO_DIRECT_MODE;
+ indio_dev->channels = icp10100_channels;
+ indio_dev->num_channels = ARRAY_SIZE(icp10100_channels);
+ indio_dev->info = &icp10100_info;
+
+ st = iio_priv(indio_dev);
+ mutex_init(&st->lock);
+ st->client = client;
+ st->chip = chip;
+ st->mode = ICP10100_MODE_N;
+
+ st->vdd = devm_regulator_get(&client->dev, "vdd");
+ if (IS_ERR(st->vdd))
+ return PTR_ERR(st->vdd);
+ ret = icp10100_enable_regulator(st);
+ if (ret)
+ return ret;
+ ret = devm_add_action_or_reset(&client->dev,
+ icp10100_disable_regulator_action, st);
+ if (ret)
+ return ret;
+
+ /* has to be done before the first i2c communication */
+ crc8_populate_msb(icp10100_crc8_table, ICP10100_CRC8_POLYNOMIAL);
+
+ ret = icp10100_init_chip(st);
+ if (ret) {
+ dev_err(&client->dev, "init chip error %d\n", ret);
+ return ret;
+ }
+
+ /* enable runtime pm with autosuspend delay of 2s */
+ pm_runtime_get_noresume(&client->dev);
+ pm_runtime_set_active(&client->dev);
+ pm_runtime_enable(&client->dev);
+ pm_runtime_set_autosuspend_delay(&client->dev, 2000);
+ pm_runtime_use_autosuspend(&client->dev);
+ pm_runtime_put(&client->dev);
+ ret = devm_add_action_or_reset(&client->dev, icp10100_pm_disable,
+ &client->dev);
+ if (ret)
+ return ret;
+
+ return devm_iio_device_register(&client->dev, indio_dev);
+}
+
+static int __maybe_unused icp10100_suspend(struct device *dev)
+{
+ struct icp10100_state *st = iio_priv(dev_get_drvdata(dev));
+ int ret;
+
+ mutex_lock(&st->lock);
+ ret = regulator_disable(st->vdd);
+ mutex_unlock(&st->lock);
+
+ return ret;
+}
+
+static int __maybe_unused icp10100_resume(struct device *dev)
+{
+ struct icp10100_state *st = iio_priv(dev_get_drvdata(dev));
+ int ret;
+
+ mutex_lock(&st->lock);
+
+ ret = icp10100_enable_regulator(st);
+ if (ret)
+ goto out_unlock;
+
+ /* reset chip */
+ ret = icp10100_send_cmd(st, &icp10100_cmd_soft_reset, NULL, 0);
+
+out_unlock:
+ mutex_unlock(&st->lock);
+ return ret;
+}
+
+static UNIVERSAL_DEV_PM_OPS(icp10100_pm, icp10100_suspend, icp10100_resume,
+ NULL);
+
+static const struct of_device_id icp10100_of_match[] = {
+ {
+ .compatible = "invensense,icp10100",
+ .data = (void *)ICP10100,
+ },
+ { }
+};
+MODULE_DEVICE_TABLE(of, icp10100_of_match);
+
+static const struct i2c_device_id icp10100_id[] = {
+ { "icp10100", ICP10100 },
+ { }
+};
+MODULE_DEVICE_TABLE(i2c, icp10100_id);
+
+static struct i2c_driver icp10100_driver = {
+ .driver = {
+ .name = "icp10100",
+ .pm = &icp10100_pm,
+ .of_match_table = of_match_ptr(icp10100_of_match),
+ },
+ .probe = icp10100_probe,
+ .id_table = icp10100_id,
+};
+module_i2c_driver(icp10100_driver);
+
+MODULE_AUTHOR("InvenSense, Inc.");
+MODULE_DESCRIPTION("InvenSense icp10100 driver");
+MODULE_LICENSE("GPL");
--
2.17.1
^ permalink raw reply related
* [PATCH RESEND v6 07/16] powerpc/mm: Use helper fault_signal_pending()
From: Peter Xu @ 2020-02-20 16:02 UTC (permalink / raw)
To: linux-mm, linux-kernel
Cc: Peter Xu, Martin Cracauer, Mike Rapoport, Hugh Dickins,
Jerome Glisse, Kirill A . Shutemov, Matthew Wilcox,
Pavel Emelyanov, Brian Geffon, Maya Gokhale, Denis Plotnikov,
Andrea Arcangeli, Johannes Weiner, Dr . David Alan Gilbert,
Linus Torvalds, Mike Kravetz, Marty McFadden, David Hildenbrand,
Bobby Powers, Mel Gorman
In-Reply-To: <20200220155353.8676-1-peterx@redhat.com>
Let powerpc code to use the new helper, by moving the signal handling
earlier before the retry logic.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
arch/powerpc/mm/fault.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 8db0507619e2..0868172ce4e3 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -582,6 +582,9 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
major |= fault & VM_FAULT_MAJOR;
+ if (fault_signal_pending(fault, regs))
+ return user_mode(regs) ? 0 : SIGBUS;
+
/*
* Handle the retry right now, the mmap_sem has been released in that
* case.
@@ -595,15 +598,8 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
*/
flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags |= FAULT_FLAG_TRIED;
- if (!fatal_signal_pending(current))
- goto retry;
+ goto retry;
}
-
- /*
- * User mode? Just return to handle the fatal exception otherwise
- * return to bad_page_fault
- */
- return is_user ? 0 : SIGBUS;
}
up_read(¤t->mm->mmap_sem);
--
2.24.1
^ permalink raw reply related
* Re: [RFC PATCH v3 05/27] qcow2: Document the Extended L2 Entries feature
From: Alberto Garcia @ 2020-02-20 16:04 UTC (permalink / raw)
To: Eric Blake, Max Reitz, qemu-devel
Cc: Kevin Wolf, Denis V . Lunev, Anton Nefedov,
Vladimir Sementsov-Ogievskiy, qemu-block
In-Reply-To: <fcaace04-17be-66b2-e0aa-6b1c68b11989@redhat.com>
On Thu 20 Feb 2020 05:02:22 PM CET, Eric Blake wrote:
>>> + Bits are assigned starting from the most significant one.
>>> + (i.e. bit x is used for subcluster 31 - x)
>>
>> I still prefer it the other way round, both personally (e.g. it’s the
>> C ordering), and because other places in qcow2 use LSb for bit
>> ordering (the refcount order).
>
> Internal consistency with refcount order using LSb ordering is the
> strongest reason to flip things, and have bit x be subcluster x.
Ok, I think you're both right, I'll change that.
Berto
^ permalink raw reply
* Re: [PATCH v4 6/6] regulator: Use driver_deferred_probe_timeout for regulator_init_complete_work
From: Bjorn Andersson @ 2020-02-20 16:05 UTC (permalink / raw)
To: John Stultz
Cc: lkml, Rob Herring, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
Pavel Machek, Len Brown, Todd Kjos, Liam Girdwood, Mark Brown,
Thierry Reding, Linus Walleij, Greg Kroah-Hartman, linux-pm
In-Reply-To: <20200220050440.45878-7-john.stultz@linaro.org>
On Wed 19 Feb 21:04 PST 2020, John Stultz wrote:
> The regulator_init_complete_work logic defers the cleanup for an
> arbitrary 30 seconds of time to allow modules loaded by userland
> to start.
>
> This arbitrary timeout is similar to the
> driver_deferred_probe_timeout value, and its been suggested we
> align these so users have a method to extend the timeouts as
> needed.
>
> So this patch changes the logic to use the
> driver_deferred_probe_timeout value if it is set, otherwise we
> directly call the regulator_init_complete_work_function().
>
> Cc: Rob Herring <robh@kernel.org>
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Kevin Hilman <khilman@kernel.org>
> Cc: Ulf Hansson <ulf.hansson@linaro.org>
> Cc: Pavel Machek <pavel@ucw.cz>
> Cc: Len Brown <len.brown@intel.com>
> Cc: Todd Kjos <tkjos@google.com>
> Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
> Cc: Liam Girdwood <lgirdwood@gmail.com>
> Cc: Mark Brown <broonie@kernel.org>
> Cc: Thierry Reding <treding@nvidia.com>
> Cc: Linus Walleij <linus.walleij@linaro.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: linux-pm@vger.kernel.org
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> Change-Id: I9fa2411abbb91ed4dd0edc41e8cc8583577c005b
Change-Id...
> ---
> v4:
> * Split out into its own patch, as suggested by Mark
> ---
> drivers/regulator/core.c | 25 ++++++++++++++-----------
> 1 file changed, 14 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
> index d015d99cb59d..394e7b11576a 100644
> --- a/drivers/regulator/core.c
> +++ b/drivers/regulator/core.c
> @@ -5767,18 +5767,21 @@ static int __init regulator_init_complete(void)
> has_full_constraints = true;
>
> /*
> - * We punt completion for an arbitrary amount of time since
> - * systems like distros will load many drivers from userspace
> - * so consumers might not always be ready yet, this is
> - * particularly an issue with laptops where this might bounce
> - * the display off then on. Ideally we'd get a notification
> - * from userspace when this happens but we don't so just wait
> - * a bit and hope we waited long enough. It'd be better if
> - * we'd only do this on systems that need it, and a kernel
> - * command line option might be useful.
> + * If driver_deferred_probe_timeout is set, we punt
> + * completion for that many seconds since systems like
> + * distros will load many drivers from userspace so consumers
> + * might not always be ready yet, this is particularly an
> + * issue with laptops where this might bounce the display off
> + * then on. Ideally we'd get a notification from userspace
> + * when this happens but we don't so just wait a bit and hope
> + * we waited long enough. It'd be better if we'd only do
> + * this on systems that need it.
> */
> - schedule_delayed_work(®ulator_init_complete_work,
> - msecs_to_jiffies(30000));
> + if (driver_deferred_probe_timeout >= 0)
> + schedule_delayed_work(®ulator_init_complete_work,
> + driver_deferred_probe_timeout * HZ);
> + else
> + regulator_init_complete_work_function(NULL);
Why not schedule_delayed_work(..., 0) in this case, to get it off the
initcall context and to avoid the difference in execution paths?
Regards,
Bjorn
>
> return 0;
> }
> --
> 2.17.1
>
^ permalink raw reply
* [PATCH 2/2] virtio: let virtio use DMA API when guest RAM is protected
From: Halil Pasic @ 2020-02-20 16:06 UTC (permalink / raw)
To: Michael S. Tsirkin, Jason Wang, Marek Szyprowski, Robin Murphy,
Christoph Hellwig
Cc: Halil Pasic, linux-s390, virtualization, linux-kernel, iommu,
Christian Borntraeger, Janosch Frank, Viktor Mihajlovski,
Cornelia Huck, Ram Pai, Thiago Jung Bauermann, David Gibson,
Lendacky, Thomas, Michael Mueller
In-Reply-To: <20200220160606.53156-1-pasic@linux.ibm.com>
Currently the advanced guest memory protection technologies (AMD SEV,
powerpc secure guest technology and s390 Protected VMs) abuse the
VIRTIO_F_IOMMU_PLATFORM flag to make virtio core use the DMA API, which
is in turn necessary, to make IO work with guest memory protection.
But VIRTIO_F_IOMMU_PLATFORM a.k.a. VIRTIO_F_ACCESS_PLATFORM is really a
different beast: with virtio devices whose implementation runs on an SMP
CPU we are still fine with doing all the usual optimizations, it is just
that we need to make sure that the memory protection mechanism does not
get in the way. The VIRTIO_F_ACCESS_PLATFORM mandates more work on the
side of the guest (and possibly he host side as well) than we actually
need.
An additional benefit of teaching the guest to make the right decision
(and use DMA API) on it's own is: removing the need, to mandate special
VM configuration for guests that may run with protection. This is
especially interesting for s390 as VIRTIO_F_IOMMU_PLATFORM pushes all
the virtio control structures into the first 2G of guest memory:
something we don't necessarily want to do per-default.
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Tested-by: Ram Pai <linuxram@us.ibm.com>
Tested-by: Michael Mueller <mimu@linux.ibm.com>
---
drivers/virtio/virtio_ring.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 867c7ebd3f10..fafc8f924955 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -243,6 +243,9 @@ static bool vring_use_dma_api(struct virtio_device *vdev)
if (!virtio_has_iommu_quirk(vdev))
return true;
+ if (force_dma_unencrypted(&vdev->dev))
+ return true;
+
/* Otherwise, we are left to guess. */
/*
* In theory, it's possible to have a buggy QEMU-supposed
--
2.17.1
^ permalink raw reply related
* Re: [Lsf-pc] [LSF/MM TOPIC] Memory cgroups, whether you like it or not
From: Kirill A. Shutemov @ 2020-02-20 16:06 UTC (permalink / raw)
To: Michal Hocko
Cc: Tim Chen, lsf-pc, linux-mm, Dave Hansen, Dan Williams, Huang Ying
In-Reply-To: <20200214104541.GT31689@dhcp22.suse.cz>
On Fri, Feb 14, 2020 at 11:45:41AM +0100, Michal Hocko wrote:
> On Wed 05-02-20 10:34:57, Tim Chen wrote:
> > There is existing infrastructure for memory soft limit per cgroup we
> > can leverage to implement such a scheme. We'll like to find out if this
> > approach makes sense for people working on systems with multiple memory tiers.
>
> Soft limit is dead.
Michal, could you remind what the deal with soft limit? Why is it dead?
--
Kirill A. Shutemov
^ permalink raw reply
* Re: MCE handler gets NIP wrong on MPC8378
From: Radu Rendec @ 2020-02-20 16:02 UTC (permalink / raw)
To: Christophe Leroy; +Cc: linuxppc-dev
In-Reply-To: <09e9a042-766c-d2e6-2300-cebc372cabde@c-s.fr>
On 02/20/2020 at 3:38 AM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> On 02/19/2020 10:39 PM, Radu Rendec wrote:
> > On 02/19/2020 at 4:21 PM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> >>> Interesting.
> >>>
> >>> 0x900 is the adress of the timer interrupt.
> >>>
> >>> Would the MCE occur just after the timer interrupt ?
> >
> > I doubt that. I'm using a small test module to artificially trigger the
> > MCE. Basically it's just this (the full code is in my original post):
> >
> > bad_addr_base = ioremap(0xf0000000, 0x100);
> > x = ioread32(bad_addr_base);
> >
> > I find it hard to believe that every time I load the module the lwbrx
> > instruction that triggers the MCE is executed exactly after the timer
> > interrupt (or that the timer interrupt always occurs close to the lwbrx
> > instruction).
>
> Can you try to see how much time there is between your read and the MCE ?
> The below should allow it, you'll see first value in r13 and the other
> in r14 (mce.c is your test code)
>
> Also provide the timebase frequency as reported in /proc/cpuinfo
I just ran a test: r13 is 0xda8e0f91 and r14 is 0xdaae0f9c.
# cat /proc/cpuinfo
processor : 0
cpu : e300c4
clock : 800.000004MHz
revision : 1.1 (pvr 8086 1011)
bogomips : 200.00
timebase : 100000000
The difference between r14 and r13 is 0x20000b. Assuming TB is
incremented with 'timebase' frequency, that means 20.97 milliseconds
(although the e300 manual says TB is "incremented once every four core
input clock cycles").
I repeated the test twice and the absolute values were of course very
different, but r14-r13 was 0x20000c and 0x200011, so it seems to be
quite consistent (within just a few clock cycles).
Just for the fun of it, I repeated the test once more, but with
interrupts disabled. The difference was 0x200014. FWIW, I disabled
interrupts before sampling TB in r13.
> And what's the reason given in the Oops message for the machine check ?
> Is that "Caused by (from SRR1=49030): Transfer error ack signal" or
> something else ?
When interrupts are enabled:
Caused by (from SRR1=41000): Transfer error ack signal
When interrupts are disabled:
Caused by (from SRR1=41030): Transfer error ack signal
> >
> >> Do you use the local bus monitoring driver ?
> >
> > I don't. In fact, I'm not even aware of it. What driver is that?
>
> CONFIG_FSL_LBC
OK, it seems I'm actually using it. I haven't enabled it explicitly, but
it's automatically pulled by CONFIG_MTD_NAND_FSL_ELBC as a prerequisite.
I looked at the code in arch/powerpc/sysdev/fsl_lbc.c and it's quite
small. Most of the code is in fsl_lbc_ctrl_irq, which I guess is
supposed to print a message if/when the LBC catches an error. I've never
seen any of those messages being printed.
Best regards,
Radu
^ permalink raw reply
* [igt-dev] ✓ Fi.CI.BAT: success for lib/i915: Restrict mmap types to GTT if no MMAP_OFFSET support
From: Patchwork @ 2020-02-20 16:04 UTC (permalink / raw)
To: Janusz Krzysztofik; +Cc: igt-dev
In-Reply-To: <20200220153203.29571-1-janusz.krzysztofik@linux.intel.com>
== Series Details ==
Series: lib/i915: Restrict mmap types to GTT if no MMAP_OFFSET support
URL : https://patchwork.freedesktop.org/series/73719/
State : success
== Summary ==
CI Bug Log - changes from CI_DRM_7973 -> IGTPW_4198
====================================================
Summary
-------
**SUCCESS**
No regressions found.
External URL: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4198/index.html
New tests
---------
New tests have been introduced between CI_DRM_7973 and IGTPW_4198:
### New IGT tests (4) ###
* igt@i915_pm_backlight@basic-brightness:
- Statuses : 1 dmesg-warn(s) 17 pass(s) 23 skip(s)
- Exec time: [0.0, 0.23] s
* igt@i915_pm_rpm@basic-pci-d3-state:
- Statuses : 1 dmesg-warn(s) 29 pass(s) 11 skip(s)
- Exec time: [0.0, 10.49] s
* igt@i915_pm_rpm@basic-rte:
- Statuses : 1 dmesg-warn(s) 29 pass(s) 11 skip(s)
- Exec time: [0.44, 11.29] s
* igt@i915_pm_rps@basic-api:
- Statuses : 36 pass(s) 5 skip(s)
- Exec time: [0.0, 0.01] s
Known issues
------------
Here are the changes found in IGTPW_4198 that come from known issues:
### IGT changes ###
#### Issues hit ####
* igt@gem_close_race@basic-threads:
- fi-byt-n2820: [PASS][1] -> [INCOMPLETE][2] ([i915#45])
[1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7973/fi-byt-n2820/igt@gem_close_race@basic-threads.html
[2]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4198/fi-byt-n2820/igt@gem_close_race@basic-threads.html
* igt@gem_mmap_gtt@basic:
- fi-tgl-y: [PASS][3] -> [DMESG-WARN][4] ([CI#94] / [i915#402]) +1 similar issue
[3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7973/fi-tgl-y/igt@gem_mmap_gtt@basic.html
[4]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4198/fi-tgl-y/igt@gem_mmap_gtt@basic.html
* igt@i915_selftest@live_execlists:
- fi-tgl-y: [PASS][5] -> [INCOMPLETE][6] ([CI#94] / [i915#647])
[5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7973/fi-tgl-y/igt@i915_selftest@live_execlists.html
[6]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4198/fi-tgl-y/igt@i915_selftest@live_execlists.html
#### Possible fixes ####
* igt@i915_selftest@live_gem_contexts:
- fi-cml-s: [DMESG-FAIL][7] ([i915#877]) -> [PASS][8]
[7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7973/fi-cml-s/igt@i915_selftest@live_gem_contexts.html
[8]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4198/fi-cml-s/igt@i915_selftest@live_gem_contexts.html
* igt@kms_chamelium@hdmi-hpd-fast:
- fi-icl-u2: [FAIL][9] ([i915#217]) -> [PASS][10]
[9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7973/fi-icl-u2/igt@kms_chamelium@hdmi-hpd-fast.html
[10]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4198/fi-icl-u2/igt@kms_chamelium@hdmi-hpd-fast.html
* igt@prime_self_import@basic-llseek-bad:
- fi-tgl-y: [DMESG-WARN][11] ([CI#94] / [i915#402]) -> [PASS][12] +1 similar issue
[11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7973/fi-tgl-y/igt@prime_self_import@basic-llseek-bad.html
[12]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4198/fi-tgl-y/igt@prime_self_import@basic-llseek-bad.html
#### Warnings ####
* igt@amdgpu/amd_prime@amd-to-i915:
- fi-icl-u3: [SKIP][13] ([fdo#109315]) -> [SKIP][14] ([fdo#109315] / [i915#585])
[13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7973/fi-icl-u3/igt@amdgpu/amd_prime@amd-to-i915.html
[14]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4198/fi-icl-u3/igt@amdgpu/amd_prime@amd-to-i915.html
* igt@kms_chamelium@hdmi-hpd-fast:
- fi-kbl-7500u: [FAIL][15] ([fdo#111407]) -> [FAIL][16] ([fdo#111096] / [i915#323])
[15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7973/fi-kbl-7500u/igt@kms_chamelium@hdmi-hpd-fast.html
[16]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4198/fi-kbl-7500u/igt@kms_chamelium@hdmi-hpd-fast.html
[CI#94]: https://gitlab.freedesktop.org/gfx-ci/i915-infra/issues/94
[fdo#109315]: https://bugs.freedesktop.org/show_bug.cgi?id=109315
[fdo#111096]: https://bugs.freedesktop.org/show_bug.cgi?id=111096
[fdo#111407]: https://bugs.freedesktop.org/show_bug.cgi?id=111407
[i915#217]: https://gitlab.freedesktop.org/drm/intel/issues/217
[i915#323]: https://gitlab.freedesktop.org/drm/intel/issues/323
[i915#402]: https://gitlab.freedesktop.org/drm/intel/issues/402
[i915#45]: https://gitlab.freedesktop.org/drm/intel/issues/45
[i915#585]: https://gitlab.freedesktop.org/drm/intel/issues/585
[i915#647]: https://gitlab.freedesktop.org/drm/intel/issues/647
[i915#877]: https://gitlab.freedesktop.org/drm/intel/issues/877
Participating hosts (49 -> 46)
------------------------------
Additional (4): fi-kbl-soraka fi-skl-lmem fi-ivb-3770 fi-pnv-d510
Missing (7): fi-ilk-m540 fi-hsw-4200u fi-skl-6770hq fi-byt-squawks fi-bsw-cyan fi-ctg-p8600 fi-bdw-samus
Build changes
-------------
* CI: CI-20190529 -> None
* IGT: IGT_5453 -> IGTPW_4198
CI-20190529: 20190529
CI_DRM_7973: 07350317e4b2be54b1de7f1e73f77875df5e43f3 @ git://anongit.freedesktop.org/gfx-ci/linux
IGTPW_4198: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4198/index.html
IGT_5453: cae9a5881ed2c5be2c2518a255740b612a927f9a @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4198/index.html
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev
^ permalink raw reply
* Re: [RFC PATCH v3 06/27] qcow2: Add dummy has_subclusters() function
From: Max Reitz @ 2020-02-20 16:03 UTC (permalink / raw)
To: Alberto Garcia, qemu-devel
Cc: Kevin Wolf, Anton Nefedov, qemu-block,
Vladimir Sementsov-Ogievskiy, Denis V . Lunev
In-Reply-To: <bebe4058df5210ac3293e917ad6b61abac398f60.1577014346.git.berto@igalia.com>
[-- Attachment #1.1: Type: text/plain, Size: 542 bytes --]
On 22.12.19 12:36, Alberto Garcia wrote:
> This function will be used by the qcow2 code to check if an image has
> subclusters or not.
>
> At the moment this simply returns false. Once all patches needed for
> subcluster support are ready then QEMU will be able to create and
> read images with subclusters and this function will return the actual
> value.
>
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
> block/qcow2.h | 6 ++++++
> 1 file changed, 6 insertions(+)
Reviewed-by: Max Reitz <mreitz@redhat.com>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply
* [PATCH RESEND v6 09/16] mm: Return faster for non-fatal signals in user mode faults
From: Peter Xu @ 2020-02-20 16:02 UTC (permalink / raw)
To: linux-mm, linux-kernel
Cc: Peter Xu, Martin Cracauer, Mike Rapoport, Hugh Dickins,
Jerome Glisse, Kirill A . Shutemov, Matthew Wilcox,
Pavel Emelyanov, Brian Geffon, Maya Gokhale, Denis Plotnikov,
Andrea Arcangeli, Johannes Weiner, Dr . David Alan Gilbert,
Linus Torvalds, Mike Kravetz, Marty McFadden, David Hildenbrand,
Bobby Powers, Mel Gorman
In-Reply-To: <20200220155353.8676-1-peterx@redhat.com>
The idea comes from the upstream discussion between Linus and Andrea:
https://lore.kernel.org/lkml/20171102193644.GB22686@redhat.com/
A summary to the issue: there was a special path in handle_userfault()
in the past that we'll return a VM_FAULT_NOPAGE when we detected
non-fatal signals when waiting for userfault handling. We did that by
reacquiring the mmap_sem before returning. However that brings a risk
in that the vmas might have changed when we retake the mmap_sem and
even we could be holding an invalid vma structure.
This patch is a preparation of removing that special path by allowing
the page fault to return even faster if we were interrupted by a
non-fatal signal during a user-mode page fault handling routine.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
include/linux/sched/signal.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 4c87ffce64d1..09d40ce6a162 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -379,7 +379,8 @@ static inline bool fault_signal_pending(unsigned int fault_flags,
struct pt_regs *regs)
{
return unlikely((fault_flags & VM_FAULT_RETRY) &&
- fatal_signal_pending(current));
+ (fatal_signal_pending(current) ||
+ (user_mode(regs) && signal_pending(current))));
}
/*
--
2.24.1
^ permalink raw reply related
* Re: omap-secure.c:undefined reference to `__arm_smccc_smc'
From: Andrew F. Davis @ 2020-02-20 16:03 UTC (permalink / raw)
To: Tony Lindgren; +Cc: kbuild-all, linux-kernel, kbuild test robot
In-Reply-To: <20200220155429.GH37466@atomide.com>
On 2/20/20 10:54 AM, Tony Lindgren wrote:
> Andrew,
>
> * kbuild test robot <lkp@intel.com> [200213 10:27]:
>> tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>> head: 0bf999f9c5e74c7ecf9dafb527146601e5c848b9
>> commit: c37baa06f8a970e4a533d41f7d33e5e57de5ad25 ARM: OMAP2+: Fix undefined reference to omap_secure_init
>> date: 3 weeks ago
>> config: arm-randconfig-a001-20200213 (attached as .config)
>> compiler: arm-linux-gnueabi-gcc (GCC) 7.5.0
>> reproduce:
>> wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>> chmod +x ~/bin/make.cross
>> git checkout c37baa06f8a970e4a533d41f7d33e5e57de5ad25
>> # save the attached .config to linux build tree
>> GCC_VERSION=7.5.0 make.cross ARCH=arm
>>
>> If you fix the issue, kindly add following tag
>> Reported-by: kbuild test robot <lkp@intel.com>
>>
>> All errors (new ones prefixed by >>):
>>
>> arch/arm/mach-omap2/omap-secure.o: In function `omap_smccc_smc':
>>>> omap-secure.c:(.text+0x94): undefined reference to `__arm_smccc_smc'
>
> Have you looked at this one? Looks like there's still an unhandled
> randconfig build case.
>
I've had a quick look, all the ARM config does:
select HAVE_ARM_SMCCC if CPU_V7
so I don't think this will happen in any real config, but if we want to
prevent randconfig issue this we could force ARCH_OMAP2PLUS to "depend"
on it.
Andrew
> Regards,
>
> Tony
>
^ permalink raw reply
* Re: omap-secure.c:undefined reference to `__arm_smccc_smc'
From: Andrew F. Davis @ 2020-02-20 16:03 UTC (permalink / raw)
To: kbuild-all
In-Reply-To: <20200220155429.GH37466@atomide.com>
[-- Attachment #1: Type: text/plain, Size: 1528 bytes --]
On 2/20/20 10:54 AM, Tony Lindgren wrote:
> Andrew,
>
> * kbuild test robot <lkp@intel.com> [200213 10:27]:
>> tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>> head: 0bf999f9c5e74c7ecf9dafb527146601e5c848b9
>> commit: c37baa06f8a970e4a533d41f7d33e5e57de5ad25 ARM: OMAP2+: Fix undefined reference to omap_secure_init
>> date: 3 weeks ago
>> config: arm-randconfig-a001-20200213 (attached as .config)
>> compiler: arm-linux-gnueabi-gcc (GCC) 7.5.0
>> reproduce:
>> wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>> chmod +x ~/bin/make.cross
>> git checkout c37baa06f8a970e4a533d41f7d33e5e57de5ad25
>> # save the attached .config to linux build tree
>> GCC_VERSION=7.5.0 make.cross ARCH=arm
>>
>> If you fix the issue, kindly add following tag
>> Reported-by: kbuild test robot <lkp@intel.com>
>>
>> All errors (new ones prefixed by >>):
>>
>> arch/arm/mach-omap2/omap-secure.o: In function `omap_smccc_smc':
>>>> omap-secure.c:(.text+0x94): undefined reference to `__arm_smccc_smc'
>
> Have you looked at this one? Looks like there's still an unhandled
> randconfig build case.
>
I've had a quick look, all the ARM config does:
select HAVE_ARM_SMCCC if CPU_V7
so I don't think this will happen in any real config, but if we want to
prevent randconfig issue this we could force ARCH_OMAP2PLUS to "depend"
on it.
Andrew
> Regards,
>
> Tony
>
^ permalink raw reply
* Re: [RFC PATCH v3 05/27] qcow2: Document the Extended L2 Entries feature
From: Eric Blake @ 2020-02-20 16:02 UTC (permalink / raw)
To: Max Reitz, Alberto Garcia, qemu-devel
Cc: Kevin Wolf, Denis V . Lunev, Anton Nefedov,
Vladimir Sementsov-Ogievskiy, qemu-block
In-Reply-To: <7ff19f65-5148-a40a-9b7a-6a330cf7272e@redhat.com>
On 2/20/20 9:54 AM, Max Reitz wrote:
>> +Subcluster Allocation Bitmap (for standard clusters):
>> +
>> + Bit 0 - 31: Allocation status (one bit per subcluster)
>> +
>> + 1: the subcluster is allocated. In this case the
>> + host cluster offset field must contain a valid
>> + offset.
>> + 0: the subcluster is not allocated. In this case
>> + read requests shall go to the backing file or
>> + return zeros if there is no backing file data.
>> +
>> + Bits are assigned starting from the most significant one.
>> + (i.e. bit x is used for subcluster 31 - x)
>
> I still prefer it the other way round, both personally (e.g. it’s the C
> ordering), and because other places in qcow2 use LSb for bit ordering
> (the refcount order).
Internal consistency with refcount order using LSb ordering is the
strongest reason to flip things, and have bit x be subcluster x.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization: qemu.org | libvirt.org
^ permalink raw reply
* Re: [PATCH 0/5] Support metric group constraint
From: Liang, Kan @ 2020-02-20 16:03 UTC (permalink / raw)
To: Jiri Olsa
Cc: acme, mingo, peterz, linux-kernel, mark.rutland, namhyung,
ravi.bangoria, yao.jin, ak
In-Reply-To: <20200220113924.GB565976@krava>
On 2/20/2020 6:39 AM, Jiri Olsa wrote:
> On Wed, Feb 19, 2020 at 11:08:35AM -0800, kan.liang@linux.intel.com wrote:
>> From: Kan Liang <kan.liang@linux.intel.com>
>>
>> Some metric groups, e.g. Page_Walks_Utilization, will never count when
>> NMI watchdog is enabled.
>>
>> $echo 1 > /proc/sys/kernel/nmi_watchdog
>> $perf stat -M Page_Walks_Utilization
>>
>> Performance counter stats for 'system wide':
>>
>> <not counted> itlb_misses.walk_pending (0.00%)
>> <not counted> dtlb_load_misses.walk_pending (0.00%)
>> <not counted> dtlb_store_misses.walk_pending (0.00%)
>> <not counted> ept.walk_pending (0.00%)
>> <not counted> cycles (0.00%)
>>
>> 2.343460588 seconds time elapsed
>>
>> Some events weren't counted. Try disabling the NMI watchdog:
>> echo 0 > /proc/sys/kernel/nmi_watchdog
>> perf stat ...
>> echo 1 > /proc/sys/kernel/nmi_watchdog
>> The events in group usually have to be from the same PMU. Try
>> reorganizing the group.
>>
>> A metric group is a weak group, which relies on group validation
>> code in the kernel to determine whether to be opened as a group or
>> a non-group. However, group validation code may return false-positives,
>> especially when NMI watchdog is enabled. (The metric group is allowed
>> as a group but will never be scheduled.)
>>
>> The attempt to fix the group validation code has been rejected.
>> https://lore.kernel.org/lkml/20200117091341.GX2827@hirez.programming.kicks-ass.net/
>> Because we cannot accurately predict whether the group can be scheduled
>> as a group, only by checking current status.
>>
>> This patch set provides another solution to mitigate the issue.
>> Add "MetricConstraint" in event list, which provides a hint for perf tool,
>> e.g. "MetricConstraint": "NO_NMI_WATCHDOG". Perf tool can change the
>> metric group to non-group (standalone metrics) if NMI watchdog is enabled.
>
> the problem is in the missing counter, that's taken by NMI watchdog, right?
>
> and it's problem for any metric that won't fit to the available
> counters.. shouldn't we rather do this workaround for any metric
> that wouldn't fit in available counters?
I think current perf already did this.
All metric groups are weak group. Kernel (validate_group()) tells perf
tool whether a metric group fit to available counters.
If yes, the metric group will be scheduled as a group.
If no, perf tool will not using a group and re-try. The code is as below.
try_again:
if (create_perf_stat_counter(counter, &stat_config, &target) < 0) {
/* Weak group failed. Reset the group. */
if ((errno == EINVAL || errno == EBADF) &&
counter->leader != counter &&
counter->weak_group) {
counter = perf_evlist__reset_weak_group(evsel_list, counter);
goto try_again;
}
This patch set is to workaroud the false-positives from the kernel.
Kernel only validate the group itself. It assumes that all counters are
available. So, when any counter is permanently occupied by others, e.g.
watchdog, the validate_group() may return false-positives.
? not just for chosen
> ones..?
For now, I think we only need to workaround the Page_Walks_Utilization
metric group. Because it has 5 events, and one of the events is cycles.
The cycles has to be scheduled on fixed counter 2. But it's already
occupied by watchdog.
The false-positives of validate_group() will trigger the issue (metric
group never be scheduled.).
For other metric groups, even they have cycles, the issue should not be
triggered.
For example, if they have 4 or less events, the cycles can be scheduled
to GP counter instead.
If they have 6 or more events, the weak group will be reject anyway.
Perf tool will open it as non-group (standalone metrics).
I think we only need to apply the constraint for the
Page_Walks_Utilization metric group for now.
Thanks,
Kan
>
> thanks,
> jirka
>
^ permalink raw reply
* [PATCH 2/2] drm/i915/execlists: Always force a context reload when rewinding RING_TAIL
From: Chris Wilson @ 2020-02-20 16:03 UTC (permalink / raw)
To: stable
In-Reply-To: <20200220160319.1919629-1-chris@chris-wilson.co.uk>
Upstream: b1339ecac661 ("drm/i915/execlists: Always force a context reload when rewinding RING_TAIL")
If we rewind the RING_TAIL on a context, due to a preemption event, we
must force the context restore for the RING_TAIL update to be properly
handled. Rather than note which preemption events may cause us to rewind
the tail, compare the new request's tail with the previously submitted
RING_TAIL, as it turns out that timeslicing was causing unexpected
rewinds.
<idle>-0 0d.s2 1280851190us : __execlists_submission_tasklet: 0000:00:02.0 rcs0: expired last=130:4698, prio=3, hint=3
<idle>-0 0d.s2 1280851192us : __i915_request_unsubmit: 0000:00:02.0 rcs0: fence 66:119966, current 119964
<idle>-0 0d.s2 1280851195us : __i915_request_unsubmit: 0000:00:02.0 rcs0: fence 130:4698, current 4695
<idle>-0 0d.s2 1280851198us : __i915_request_unsubmit: 0000:00:02.0 rcs0: fence 130:4696, current 4695
^---- Note we unwind 2 requests from the same context
<idle>-0 0d.s2 1280851208us : __i915_request_submit: 0000:00:02.0 rcs0: fence 130:4696, current 4695
<idle>-0 0d.s2 1280851213us : __i915_request_submit: 0000:00:02.0 rcs0: fence 134:1508, current 1506
^---- But to apply the new timeslice, we have to replay the first request
before the new client can start -- the unexpected RING_TAIL rewind
<idle>-0 0d.s2 1280851219us : trace_ports: 0000:00:02.0 rcs0: submit { 130:4696*, 134:1508 }
synmark2-5425 2..s. 1280851239us : process_csb: 0000:00:02.0 rcs0: cs-irq head=5, tail=0
synmark2-5425 2..s. 1280851240us : process_csb: 0000:00:02.0 rcs0: csb[0]: status=0x00008002:0x00000000
^---- Preemption event for the ELSP update; note the lite-restore
synmark2-5425 2..s. 1280851243us : trace_ports: 0000:00:02.0 rcs0: preempted { 130:4698, 66:119966 }
synmark2-5425 2..s. 1280851246us : trace_ports: 0000:00:02.0 rcs0: promote { 130:4696*, 134:1508 }
synmark2-5425 2.... 1280851462us : __i915_request_commit: 0000:00:02.0 rcs0: fence 130:4700, current 4695
synmark2-5425 2.... 1280852111us : __i915_request_commit: 0000:00:02.0 rcs0: fence 130:4702, current 4695
synmark2-5425 2.Ns1 1280852296us : process_csb: 0000:00:02.0 rcs0: cs-irq head=0, tail=2
synmark2-5425 2.Ns1 1280852297us : process_csb: 0000:00:02.0 rcs0: csb[1]: status=0x00000814:0x00000000
synmark2-5425 2.Ns1 1280852299us : trace_ports: 0000:00:02.0 rcs0: completed { 130:4696!, 134:1508 }
synmark2-5425 2.Ns1 1280852301us : process_csb: 0000:00:02.0 rcs0: csb[2]: status=0x00000818:0x00000040
synmark2-5425 2.Ns1 1280852302us : trace_ports: 0000:00:02.0 rcs0: completed { 134:1508, 0:0 }
synmark2-5425 2.Ns1 1280852313us : process_csb: process_csb:2336 GEM_BUG_ON(!i915_request_completed(*execlists->active) && !reset_in_progress(execlists))
Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing")
Referenecs: 82c69bf58650 ("drm/i915/gt: Detect if we miss WaIdleLiteRestore")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.4+
Link: https://patchwork.freedesktop.org/patch/msgid/20200207211452.2860634-1-chris@chris-wilson.co.uk
(cherry picked from commit 5ba32c7be81e53ea8a27190b0f6be98e6c6779af)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
---
drivers/gpu/drm/i915/gt/intel_engine.h | 8 ++++++++
drivers/gpu/drm/i915/gt/intel_engine_types.h | 1 +
drivers/gpu/drm/i915/gt/intel_lrc.c | 18 ++++++++----------
drivers/gpu/drm/i915/gt/intel_ringbuffer.c | 2 ++
4 files changed, 19 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
index 22aab8593abf..926272b5a0ca 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -250,6 +250,14 @@ static inline u32 intel_ring_wrap(const struct intel_ring *ring, u32 pos)
return pos & (ring->size - 1);
}
+static inline int intel_ring_direction(const struct intel_ring *ring,
+ u32 next, u32 prev)
+{
+ typecheck(typeof(ring->size), next);
+ typecheck(typeof(ring->size), prev);
+ return (next - prev) << ring->wrap;
+}
+
static inline bool
intel_ring_offset_valid(const struct intel_ring *ring,
unsigned int pos)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 798e1b024406..c77c9518c58b 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -107,6 +107,7 @@ struct intel_ring {
u32 space;
u32 size;
+ u32 wrap;
u32 effective_size;
};
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index e0e4f3deb2da..bf6addece25b 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -647,7 +647,7 @@ static u64 execlists_update_context(struct i915_request *rq)
{
struct intel_context *ce = rq->hw_context;
u64 desc = ce->lrc_desc;
- u32 tail;
+ u32 tail, prev;
/*
* WaIdleLiteRestore:bdw,skl
@@ -660,9 +660,15 @@ static u64 execlists_update_context(struct i915_request *rq)
* subsequent resubmissions (for lite restore). Should that fail us,
* and we try and submit the same tail again, force the context
* reload.
+ *
+ * If we need to return to a preempted context, we need to skip the
+ * lite-restore and force it to reload the RING_TAIL. Otherwise, the
+ * HW has a tendency to ignore us rewinding the TAIL to the end of
+ * an earlier request.
*/
tail = intel_ring_set_tail(rq->ring, rq->tail);
- if (unlikely(ce->lrc_reg_state[CTX_RING_TAIL + 1] == tail))
+ prev = ce->lrc_reg_state[CTX_RING_TAIL + 1];
+ if (unlikely(intel_ring_direction(rq->ring, tail, prev) <= 0))
desc |= CTX_DESC_FORCE_RESTORE;
ce->lrc_reg_state[CTX_RING_TAIL + 1] = tail;
rq->tail = rq->wa_tail;
@@ -1110,14 +1116,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
*/
__unwind_incomplete_requests(engine);
- /*
- * If we need to return to the preempted context, we
- * need to skip the lite-restore and force it to
- * reload the RING_TAIL. Otherwise, the HW has a
- * tendency to ignore us rewinding the TAIL to the
- * end of an earlier request.
- */
- last->hw_context->lrc_desc |= CTX_DESC_FORCE_RESTORE;
last = NULL;
} else if (need_timeslice(engine, last) &&
!timer_pending(&engine->execlists.timer)) {
diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
index bacaa7bb8c9a..eee9fcbe0434 100644
--- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
@@ -1312,6 +1312,8 @@ intel_engine_create_ring(struct intel_engine_cs *engine, int size)
kref_init(&ring->ref);
ring->size = size;
+ ring->wrap = BITS_PER_TYPE(ring->size) - ilog2(size);
+
/* Workaround an erratum on the i830 which causes a hang if
* the TAIL pointer points to within the last 2 cachelines
* of the buffer.
--
2.25.1
^ permalink raw reply related
* [PATCH 1/2] drm/i915/gt: Detect if we miss WaIdleLiteRestore
From: Chris Wilson @ 2020-02-20 16:03 UTC (permalink / raw)
To: stable
Upstream: 82c69bf58650 ("drm/i915/gt: Detect if we miss WaIdleLiteRestore")
In order to avoid confusing the HW, we must never submit an empty ring
during lite-restore, that is we should always advance the RING_TAIL
before submitting to stay ahead of the RING_HEAD.
Normally this is prevented by keeping a couple of spare NOPs in the
request->wa_tail so that on resubmission we can advance the tail. This
relies on the request only being resubmitted once, which is the normal
condition as it is seen once for ELSP[1] and then later in ELSP[0]. On
preemption, the requests are unwound and the tail reset back to the
normal end point (as we know the request is incomplete and therefore its
RING_HEAD is even earlier).
However, if this w/a should fail we would try and resubmit the request
with the RING_TAIL already set to the location of this request's wa_tail
potentially causing a GPU hang. We can spot when we do try and
incorrectly resubmit without advancing the RING_TAIL and spare any
embarrassment by forcing the context restore.
In the case of preempt-to-busy, we leave the requests running on the HW
while we unwind. As the ring is still live, we cannot rewind our
rq->tail without forcing a reload so leave it set to rq->wa_tail and
only force a reload if we resubmit after a lite-restore. (Normally, the
forced reload will be a part of the preemption event.)
Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
Closes: https://gitlab.freedesktop.org/drm/intel/issues/673
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: stable@kernel.vger.org #5.3+
Link: https://patchwork.freedesktop.org/patch/msgid/20191209023215.3519970-1-chris@chris-wilson.co.uk
---
drivers/gpu/drm/i915/gt/intel_lrc.c | 42 ++++++++++++++---------------
1 file changed, 20 insertions(+), 22 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 4949b5ad860f..e0e4f3deb2da 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -471,12 +471,6 @@ lrc_descriptor(struct intel_context *ce, struct intel_engine_cs *engine)
return desc;
}
-static void unwind_wa_tail(struct i915_request *rq)
-{
- rq->tail = intel_ring_wrap(rq->ring, rq->wa_tail - WA_TAIL_BYTES);
- assert_ring_tail_valid(rq->ring, rq->tail);
-}
-
static struct i915_request *
__unwind_incomplete_requests(struct intel_engine_cs *engine)
{
@@ -495,7 +489,6 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
continue; /* XXX */
__i915_request_unsubmit(rq);
- unwind_wa_tail(rq);
/*
* Push the request back into the queue for later resubmission.
@@ -650,13 +643,29 @@ execlists_schedule_out(struct i915_request *rq)
i915_request_put(rq);
}
-static u64 execlists_update_context(const struct i915_request *rq)
+static u64 execlists_update_context(struct i915_request *rq)
{
struct intel_context *ce = rq->hw_context;
- u64 desc;
+ u64 desc = ce->lrc_desc;
+ u32 tail;
- ce->lrc_reg_state[CTX_RING_TAIL + 1] =
- intel_ring_set_tail(rq->ring, rq->tail);
+ /*
+ * WaIdleLiteRestore:bdw,skl
+ *
+ * We should never submit the context with the same RING_TAIL twice
+ * just in case we submit an empty ring, which confuses the HW.
+ *
+ * We append a couple of NOOPs (gen8_emit_wa_tail) after the end of
+ * the normal request to be able to always advance the RING_TAIL on
+ * subsequent resubmissions (for lite restore). Should that fail us,
+ * and we try and submit the same tail again, force the context
+ * reload.
+ */
+ tail = intel_ring_set_tail(rq->ring, rq->tail);
+ if (unlikely(ce->lrc_reg_state[CTX_RING_TAIL + 1] == tail))
+ desc |= CTX_DESC_FORCE_RESTORE;
+ ce->lrc_reg_state[CTX_RING_TAIL + 1] = tail;
+ rq->tail = rq->wa_tail;
/*
* Make sure the context image is complete before we submit it to HW.
@@ -675,7 +684,6 @@ static u64 execlists_update_context(const struct i915_request *rq)
*/
mb();
- desc = ce->lrc_desc;
ce->lrc_desc &= ~CTX_DESC_FORCE_RESTORE;
return desc;
@@ -1150,16 +1158,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
if (!list_is_last(&last->sched.link,
&engine->active.requests))
return;
-
- /*
- * WaIdleLiteRestore:bdw,skl
- * Apply the wa NOOPs to prevent
- * ring:HEAD == rq:TAIL as we resubmit the
- * request. See gen8_emit_fini_breadcrumb() for
- * where we prepare the padding after the
- * end of the request.
- */
- last->tail = last->wa_tail;
}
}
--
2.25.1
^ permalink raw reply related
* [PATCH RESEND v6 13/16] mm: Allow VM_FAULT_RETRY for multiple times
From: Peter Xu @ 2020-02-20 16:02 UTC (permalink / raw)
To: linux-mm, linux-kernel
Cc: Peter Xu, Martin Cracauer, Mike Rapoport, Hugh Dickins,
Jerome Glisse, Kirill A . Shutemov, Matthew Wilcox,
Pavel Emelyanov, Brian Geffon, Maya Gokhale, Denis Plotnikov,
Andrea Arcangeli, Johannes Weiner, Dr . David Alan Gilbert,
Linus Torvalds, Mike Kravetz, Marty McFadden, David Hildenbrand,
Bobby Powers, Mel Gorman
In-Reply-To: <20200220155353.8676-1-peterx@redhat.com>
The idea comes from a discussion between Linus and Andrea [1].
Before this patch we only allow a page fault to retry once. We
achieved this by clearing the FAULT_FLAG_ALLOW_RETRY flag when doing
handle_mm_fault() the second time. This was majorly used to avoid
unexpected starvation of the system by looping over forever to handle
the page fault on a single page. However that should hardly happen,
and after all for each code path to return a VM_FAULT_RETRY we'll
first wait for a condition (during which time we should possibly yield
the cpu) to happen before VM_FAULT_RETRY is really returned.
This patch removes the restriction by keeping the
FAULT_FLAG_ALLOW_RETRY flag when we receive VM_FAULT_RETRY. It means
that the page fault handler now can retry the page fault for multiple
times if necessary without the need to generate another page fault
event. Meanwhile we still keep the FAULT_FLAG_TRIED flag so page
fault handler can still identify whether a page fault is the first
attempt or not.
Then we'll have these combinations of fault flags (only considering
ALLOW_RETRY flag and TRIED flag):
- ALLOW_RETRY and !TRIED: this means the page fault allows to
retry, and this is the first try
- ALLOW_RETRY and TRIED: this means the page fault allows to
retry, and this is not the first try
- !ALLOW_RETRY and !TRIED: this means the page fault does not allow
to retry at all
- !ALLOW_RETRY and TRIED: this is forbidden and should never be used
In existing code we have multiple places that has taken special care
of the first condition above by checking against (fault_flags &
FAULT_FLAG_ALLOW_RETRY). This patch introduces a simple helper to
detect the first retry of a page fault by checking against
both (fault_flags & FAULT_FLAG_ALLOW_RETRY) and !(fault_flag &
FAULT_FLAG_TRIED) because now even the 2nd try will have the
ALLOW_RETRY set, then use that helper in all existing special paths.
One example is in __lock_page_or_retry(), now we'll drop the mmap_sem
only in the first attempt of page fault and we'll keep it in follow up
retries, so old locking behavior will be retained.
This will be a nice enhancement for current code [2] at the same time
a supporting material for the future userfaultfd-writeprotect work,
since in that work there will always be an explicit userfault
writeprotect retry for protected pages, and if that cannot resolve the
page fault (e.g., when userfaultfd-writeprotect is used in conjunction
with swapped pages) then we'll possibly need a 3rd retry of the page
fault. It might also benefit other potential users who will have
similar requirement like userfault write-protection.
GUP code is not touched yet and will be covered in follow up patch.
Please read the thread below for more information.
[1] https://lore.kernel.org/lkml/20171102193644.GB22686@redhat.com/
[2] https://lore.kernel.org/lkml/20181230154648.GB9832@redhat.com/
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
arch/alpha/mm/fault.c | 2 +-
arch/arc/mm/fault.c | 1 -
arch/arm/mm/fault.c | 3 ---
arch/arm64/mm/fault.c | 5 -----
arch/hexagon/mm/vm_fault.c | 1 -
arch/ia64/mm/fault.c | 1 -
arch/m68k/mm/fault.c | 3 ---
arch/microblaze/mm/fault.c | 1 -
arch/mips/mm/fault.c | 1 -
arch/nds32/mm/fault.c | 1 -
arch/nios2/mm/fault.c | 3 ---
arch/openrisc/mm/fault.c | 1 -
arch/parisc/mm/fault.c | 4 +---
arch/powerpc/mm/fault.c | 6 ------
arch/riscv/mm/fault.c | 5 -----
arch/s390/mm/fault.c | 5 +----
arch/sh/mm/fault.c | 1 -
arch/sparc/mm/fault_32.c | 1 -
arch/sparc/mm/fault_64.c | 1 -
arch/um/kernel/trap.c | 1 -
arch/unicore32/mm/fault.c | 4 +---
arch/x86/mm/fault.c | 2 --
arch/xtensa/mm/fault.c | 1 -
drivers/gpu/drm/ttm/ttm_bo_vm.c | 12 ++++++++---
include/linux/mm.h | 37 +++++++++++++++++++++++++++++++++
mm/filemap.c | 2 +-
mm/internal.h | 6 +++---
27 files changed, 54 insertions(+), 57 deletions(-)
diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c
index fcfa229cc1e7..c2d7b6d7bac7 100644
--- a/arch/alpha/mm/fault.c
+++ b/arch/alpha/mm/fault.c
@@ -169,7 +169,7 @@ do_page_fault(unsigned long address, unsigned long mmcsr,
else
current->min_flt++;
if (fault & VM_FAULT_RETRY) {
- flags &= ~FAULT_FLAG_ALLOW_RETRY;
+ flags |= FAULT_FLAG_TRIED;
/* No need to up_read(&mm->mmap_sem) as we would
* have already released it in __lock_page_or_retry
diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c
index 643fad774071..92b339c7adba 100644
--- a/arch/arc/mm/fault.c
+++ b/arch/arc/mm/fault.c
@@ -145,7 +145,6 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
*/
if (unlikely((fault & VM_FAULT_RETRY) &&
(flags & FAULT_FLAG_ALLOW_RETRY))) {
- flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags |= FAULT_FLAG_TRIED;
goto retry;
}
diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index 18ef0b143ac2..b598e6978b29 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -319,9 +319,6 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
regs, addr);
}
if (fault & VM_FAULT_RETRY) {
- /* Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk
- * of starvation. */
- flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags |= FAULT_FLAG_TRIED;
goto retry;
}
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index cbb29a43aa7f..1027851d469a 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -521,12 +521,7 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
}
if (fault & VM_FAULT_RETRY) {
- /*
- * Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk of
- * starvation.
- */
if (mm_flags & FAULT_FLAG_ALLOW_RETRY) {
- mm_flags &= ~FAULT_FLAG_ALLOW_RETRY;
mm_flags |= FAULT_FLAG_TRIED;
goto retry;
}
diff --git a/arch/hexagon/mm/vm_fault.c b/arch/hexagon/mm/vm_fault.c
index d9e15d941bdb..72334b26317a 100644
--- a/arch/hexagon/mm/vm_fault.c
+++ b/arch/hexagon/mm/vm_fault.c
@@ -102,7 +102,6 @@ void do_page_fault(unsigned long address, long cause, struct pt_regs *regs)
else
current->min_flt++;
if (fault & VM_FAULT_RETRY) {
- flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags |= FAULT_FLAG_TRIED;
goto retry;
}
diff --git a/arch/ia64/mm/fault.c b/arch/ia64/mm/fault.c
index b5aa4e80c762..30d0c1fca99e 100644
--- a/arch/ia64/mm/fault.c
+++ b/arch/ia64/mm/fault.c
@@ -167,7 +167,6 @@ ia64_do_page_fault (unsigned long address, unsigned long isr, struct pt_regs *re
else
current->min_flt++;
if (fault & VM_FAULT_RETRY) {
- flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags |= FAULT_FLAG_TRIED;
/* No need to up_read(&mm->mmap_sem) as we would
diff --git a/arch/m68k/mm/fault.c b/arch/m68k/mm/fault.c
index 182799fd9987..f7afb9897966 100644
--- a/arch/m68k/mm/fault.c
+++ b/arch/m68k/mm/fault.c
@@ -162,9 +162,6 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
else
current->min_flt++;
if (fault & VM_FAULT_RETRY) {
- /* Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk
- * of starvation. */
- flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags |= FAULT_FLAG_TRIED;
/*
diff --git a/arch/microblaze/mm/fault.c b/arch/microblaze/mm/fault.c
index 32da02778a63..3248141f8ed5 100644
--- a/arch/microblaze/mm/fault.c
+++ b/arch/microblaze/mm/fault.c
@@ -236,7 +236,6 @@ void do_page_fault(struct pt_regs *regs, unsigned long address,
else
current->min_flt++;
if (fault & VM_FAULT_RETRY) {
- flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags |= FAULT_FLAG_TRIED;
/*
diff --git a/arch/mips/mm/fault.c b/arch/mips/mm/fault.c
index f177da67d940..fd64b135fd7b 100644
--- a/arch/mips/mm/fault.c
+++ b/arch/mips/mm/fault.c
@@ -178,7 +178,6 @@ static void __kprobes __do_page_fault(struct pt_regs *regs, unsigned long write,
tsk->min_flt++;
}
if (fault & VM_FAULT_RETRY) {
- flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags |= FAULT_FLAG_TRIED;
/*
diff --git a/arch/nds32/mm/fault.c b/arch/nds32/mm/fault.c
index 2810a4e5ab27..0cf0c08c7da2 100644
--- a/arch/nds32/mm/fault.c
+++ b/arch/nds32/mm/fault.c
@@ -246,7 +246,6 @@ void do_page_fault(unsigned long entry, unsigned long addr,
1, regs, addr);
}
if (fault & VM_FAULT_RETRY) {
- flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags |= FAULT_FLAG_TRIED;
/* No need to up_read(&mm->mmap_sem) as we would
diff --git a/arch/nios2/mm/fault.c b/arch/nios2/mm/fault.c
index c38bea4220fb..ec9d8a9c426f 100644
--- a/arch/nios2/mm/fault.c
+++ b/arch/nios2/mm/fault.c
@@ -157,9 +157,6 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long cause,
else
current->min_flt++;
if (fault & VM_FAULT_RETRY) {
- /* Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk
- * of starvation. */
- flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags |= FAULT_FLAG_TRIED;
/*
diff --git a/arch/openrisc/mm/fault.c b/arch/openrisc/mm/fault.c
index 30d5c51e9d40..8af1cc78c4fb 100644
--- a/arch/openrisc/mm/fault.c
+++ b/arch/openrisc/mm/fault.c
@@ -181,7 +181,6 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long address,
else
tsk->min_flt++;
if (fault & VM_FAULT_RETRY) {
- flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags |= FAULT_FLAG_TRIED;
/* No need to up_read(&mm->mmap_sem) as we would
diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c
index 8e88e5c5f26a..86e8c848f3d7 100644
--- a/arch/parisc/mm/fault.c
+++ b/arch/parisc/mm/fault.c
@@ -328,14 +328,12 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
else
current->min_flt++;
if (fault & VM_FAULT_RETRY) {
- flags &= ~FAULT_FLAG_ALLOW_RETRY;
-
/*
* No need to up_read(&mm->mmap_sem) as we would
* have already released it in __lock_page_or_retry
* in mm/filemap.c.
*/
-
+ flags |= FAULT_FLAG_TRIED;
goto retry;
}
}
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index d7e1f8dc7e4c..d15f0f0ee806 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -590,13 +590,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
* case.
*/
if (unlikely(fault & VM_FAULT_RETRY)) {
- /* We retry only once */
if (flags & FAULT_FLAG_ALLOW_RETRY) {
- /*
- * Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk
- * of starvation.
- */
- flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags |= FAULT_FLAG_TRIED;
goto retry;
}
diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
index a252d9e38561..be84e32adc4c 100644
--- a/arch/riscv/mm/fault.c
+++ b/arch/riscv/mm/fault.c
@@ -144,11 +144,6 @@ asmlinkage void do_page_fault(struct pt_regs *regs)
1, regs, addr);
}
if (fault & VM_FAULT_RETRY) {
- /*
- * Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk
- * of starvation.
- */
- flags &= ~(FAULT_FLAG_ALLOW_RETRY);
flags |= FAULT_FLAG_TRIED;
/*
diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index 551ac311bd35..aeccdb30899a 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -513,10 +513,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
fault = VM_FAULT_PFAULT;
goto out_up;
}
- /* Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk
- * of starvation. */
- flags &= ~(FAULT_FLAG_ALLOW_RETRY |
- FAULT_FLAG_RETRY_NOWAIT);
+ flags &= ~FAULT_FLAG_RETRY_NOWAIT;
flags |= FAULT_FLAG_TRIED;
down_read(&mm->mmap_sem);
goto retry;
diff --git a/arch/sh/mm/fault.c b/arch/sh/mm/fault.c
index d9c8f2d00a54..13ee4d20e622 100644
--- a/arch/sh/mm/fault.c
+++ b/arch/sh/mm/fault.c
@@ -481,7 +481,6 @@ asmlinkage void __kprobes do_page_fault(struct pt_regs *regs,
regs, address);
}
if (fault & VM_FAULT_RETRY) {
- flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags |= FAULT_FLAG_TRIED;
/*
diff --git a/arch/sparc/mm/fault_32.c b/arch/sparc/mm/fault_32.c
index a91b0c2d84f8..f6e0e601f857 100644
--- a/arch/sparc/mm/fault_32.c
+++ b/arch/sparc/mm/fault_32.c
@@ -261,7 +261,6 @@ asmlinkage void do_sparc_fault(struct pt_regs *regs, int text_fault, int write,
1, regs, address);
}
if (fault & VM_FAULT_RETRY) {
- flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags |= FAULT_FLAG_TRIED;
/* No need to up_read(&mm->mmap_sem) as we would
diff --git a/arch/sparc/mm/fault_64.c b/arch/sparc/mm/fault_64.c
index 30653418a672..c0c0dd471b6b 100644
--- a/arch/sparc/mm/fault_64.c
+++ b/arch/sparc/mm/fault_64.c
@@ -449,7 +449,6 @@ asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs)
1, regs, address);
}
if (fault & VM_FAULT_RETRY) {
- flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags |= FAULT_FLAG_TRIED;
/* No need to up_read(&mm->mmap_sem) as we would
diff --git a/arch/um/kernel/trap.c b/arch/um/kernel/trap.c
index c59ad37eacda..8f18cf56b3dd 100644
--- a/arch/um/kernel/trap.c
+++ b/arch/um/kernel/trap.c
@@ -97,7 +97,6 @@ int handle_page_fault(unsigned long address, unsigned long ip,
else
current->min_flt++;
if (fault & VM_FAULT_RETRY) {
- flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags |= FAULT_FLAG_TRIED;
goto retry;
diff --git a/arch/unicore32/mm/fault.c b/arch/unicore32/mm/fault.c
index 34a90453ca18..a9bd08fbe588 100644
--- a/arch/unicore32/mm/fault.c
+++ b/arch/unicore32/mm/fault.c
@@ -259,9 +259,7 @@ static int do_pf(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
else
tsk->min_flt++;
if (fault & VM_FAULT_RETRY) {
- /* Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk
- * of starvation. */
- flags &= ~FAULT_FLAG_ALLOW_RETRY;
+ flags |= FAULT_FLAG_TRIED;
goto retry;
}
}
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 7b6f65333355..4ce647bbe546 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1457,8 +1457,6 @@ void do_user_addr_fault(struct pt_regs *regs,
*/
if (unlikely((fault & VM_FAULT_RETRY) &&
(flags & FAULT_FLAG_ALLOW_RETRY))) {
- /* Retry at most once */
- flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags |= FAULT_FLAG_TRIED;
goto retry;
}
diff --git a/arch/xtensa/mm/fault.c b/arch/xtensa/mm/fault.c
index 7d196dc951e8..e7172bd53ced 100644
--- a/arch/xtensa/mm/fault.c
+++ b/arch/xtensa/mm/fault.c
@@ -128,7 +128,6 @@ void do_page_fault(struct pt_regs *regs)
else
current->min_flt++;
if (fault & VM_FAULT_RETRY) {
- flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags |= FAULT_FLAG_TRIED;
/* No need to up_read(&mm->mmap_sem) as we would
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 389128b8c4dd..cb8829ca6c7f 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -59,9 +59,10 @@ static vm_fault_t ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo,
/*
* If possible, avoid waiting for GPU with mmap_sem
- * held.
+ * held. We only do this if the fault allows retry and this
+ * is the first attempt.
*/
- if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) {
+ if (fault_flag_allow_retry_first(vmf->flags)) {
ret = VM_FAULT_RETRY;
if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT)
goto out_unlock;
@@ -135,7 +136,12 @@ vm_fault_t ttm_bo_vm_reserve(struct ttm_buffer_object *bo,
* for the buffer to become unreserved.
*/
if (unlikely(!dma_resv_trylock(bo->base.resv))) {
- if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) {
+ /*
+ * If the fault allows retry and this is the first
+ * fault attempt, we try to release the mmap_sem
+ * before waiting
+ */
+ if (fault_flag_allow_retry_first(vmf->flags)) {
if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {
ttm_bo_get(bo);
up_read(&vmf->vma->vm_mm->mmap_sem);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ff653f9136dd..51a886d50758 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -391,6 +391,25 @@ extern pgprot_t protection_map[16];
* @FAULT_FLAG_REMOTE: The fault is not for current task/mm.
* @FAULT_FLAG_INSTRUCTION: The fault was during an instruction fetch.
* @FAULT_FLAG_INTERRUPTIBLE: The fault can be interrupted by non-fatal signals.
+ *
+ * About @FAULT_FLAG_ALLOW_RETRY and @FAULT_FLAG_TRIED: we can specify
+ * whether we would allow page faults to retry by specifying these two
+ * fault flags correctly. Currently there can be three legal combinations:
+ *
+ * (a) ALLOW_RETRY and !TRIED: this means the page fault allows retry, and
+ * this is the first try
+ *
+ * (b) ALLOW_RETRY and TRIED: this means the page fault allows retry, and
+ * we've already tried at least once
+ *
+ * (c) !ALLOW_RETRY and !TRIED: this means the page fault does not allow retry
+ *
+ * The unlisted combination (!ALLOW_RETRY && TRIED) is illegal and should never
+ * be used. Note that page faults can be allowed to retry for multiple times,
+ * in which case we'll have an initial fault with flags (a) then later on
+ * continuous faults with flags (b). We should always try to detect pending
+ * signals before a retry to make sure the continuous page faults can still be
+ * interrupted if necessary.
*/
#define FAULT_FLAG_WRITE 0x01
#define FAULT_FLAG_MKWRITE 0x02
@@ -411,6 +430,24 @@ extern pgprot_t protection_map[16];
FAULT_FLAG_KILLABLE | \
FAULT_FLAG_INTERRUPTIBLE)
+/**
+ * fault_flag_allow_retry_first - check ALLOW_RETRY the first time
+ *
+ * This is mostly used for places where we want to try to avoid taking
+ * the mmap_sem for too long a time when waiting for another condition
+ * to change, in which case we can try to be polite to release the
+ * mmap_sem in the first round to avoid potential starvation of other
+ * processes that would also want the mmap_sem.
+ *
+ * Return: true if the page fault allows retry and this is the first
+ * attempt of the fault handling; false otherwise.
+ */
+static inline bool fault_flag_allow_retry_first(unsigned int flags)
+{
+ return (flags & FAULT_FLAG_ALLOW_RETRY) &&
+ (!(flags & FAULT_FLAG_TRIED));
+}
+
#define FAULT_FLAG_TRACE \
{ FAULT_FLAG_WRITE, "WRITE" }, \
{ FAULT_FLAG_MKWRITE, "MKWRITE" }, \
diff --git a/mm/filemap.c b/mm/filemap.c
index 1784478270e1..590ec3a9f5da 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1386,7 +1386,7 @@ EXPORT_SYMBOL_GPL(__lock_page_killable);
int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
unsigned int flags)
{
- if (flags & FAULT_FLAG_ALLOW_RETRY) {
+ if (fault_flag_allow_retry_first(flags)) {
/*
* CAUTION! In this case, mmap_sem is not released
* even though return 0.
diff --git a/mm/internal.h b/mm/internal.h
index 3cf20ab3ca01..5958cfe50a0c 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -377,10 +377,10 @@ static inline struct file *maybe_unlock_mmap_for_io(struct vm_fault *vmf,
/*
* FAULT_FLAG_RETRY_NOWAIT means we don't want to wait on page locks or
* anything, so we only pin the file and drop the mmap_sem if only
- * FAULT_FLAG_ALLOW_RETRY is set.
+ * FAULT_FLAG_ALLOW_RETRY is set, while this is the first attempt.
*/
- if ((flags & (FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT)) ==
- FAULT_FLAG_ALLOW_RETRY) {
+ if (fault_flag_allow_retry_first(flags) &&
+ !(flags & FAULT_FLAG_RETRY_NOWAIT)) {
fpin = get_file(vmf->vma->vm_file);
up_read(&vmf->vma->vm_mm->mmap_sem);
}
--
2.24.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.