* [PATCH 2/2] riscv: sifive_u: Update BIOS_FILENAME for 32-bit
From: Bin Meng @ 2020-02-20 14:42 UTC (permalink / raw)
To: Alistair Francis, Bastian Koppelmann, Palmer Dabbelt,
Sagar Karandikar, qemu-devel, qemu-riscv
In-Reply-To: <1582209758-2996-1-git-send-email-bmeng.cn@gmail.com>
Update BIOS_FILENAME to consider 32-bit bios image file name.
Tested booting Linux v5.5 32-bit image (built from rv32_defconfig
plus CONFIG_SOC_SIFIVE) with the default 32-bit bios image.
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
---
hw/riscv/sifive_u.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index ca561d3..371133e 100644
--- a/hw/riscv/sifive_u.c
+++ b/hw/riscv/sifive_u.c
@@ -57,7 +57,11 @@
#include <libfdt.h>
-#define BIOS_FILENAME "opensbi-riscv64-sifive_u-fw_jump.bin"
+#if defined(TARGET_RISCV32)
+# define BIOS_FILENAME "opensbi-riscv32-sifive_u-fw_jump.bin"
+#else
+# define BIOS_FILENAME "opensbi-riscv64-sifive_u-fw_jump.bin"
+#endif
static const struct MemmapEntry {
hwaddr base;
--
2.7.4
^ permalink raw reply related
* Unexpected high latency on bbb
From: Laurentiu-Cristian Duca @ 2020-02-20 14:42 UTC (permalink / raw)
To: linux-rt-users
Hello,
I have made a rt-test on kernel preempt-rt 5.4.5
and got unexpected high latencies (5ms max).
Maybe I am doing something wrong ... please help
The same test got 120us max latency on EVL project
(on 3 hour test with one million samples)
and only the synchronization is different on preempt_rt
(referring to rt_gpio module described below).
scenario between beaglebone black computer and fpga board
====
fpga generates an interrupt and counts latency until ack
bbb receives and ack interrupt via rt_gpio module
fpga sends latency counter value via SPI to bbb
bbb prints on the screen
result: 5ms max latency.
rt_gpio.c module
====
...
static struct swait_queue_head head_swait;
static irqreturn_t rt_interrupt_handler(int irq, void *data)
{
n_interrupts++;
/* use swait.h model to signal interrupt arrived */
interrupt_done = 1;
smp_mb();
if (swait_active(&head_swait))
swake_up_one(&head_swait);
return IRQ_HANDLED;
}
static ssize_t rt_gpio_read(struct file *f, char __user *buf, size_t
len, loff_t *off)
{
int ret;
DECLARE_SWAITQUEUE(swait);
/* wait for interrupt using swait.h model */
for (;;) {
prepare_to_swait_exclusive(&head_swait, &swait, TASK_INTERRUPTIBLE);
/* smp_mb() from set_current_state() */
if (interrupt_done)
break;
schedule();
}
finish_swait(&head_swait, &swait);
interrupt_done = 0;
smp_mb();
gpiod_get_raw_value(gpiod_in)
copy_to_user(...);
}
static ssize_t rt_gpio_write(struct file *f, const char __user *buf,
size_t len, loff_t *off)
{
copy_from_user(...);
gpiod_set_raw_value(gpiod_out, user_value);
}
static int __init rt_gpio_init(void) /* Constructor */
{
...
request_irq(irq_line, rt_interrupt_handler,
IRQ_TYPE_EDGE_RISING | IRQF_NO_THREAD, "rt_gpio interrupt\n", (void *)1));
init_swait_queue_head(&head_swait);
}
static void __exit rt_gpio_exit(void)
{
free_irq(irq_line, (void*)1);
gpio_free(gpio_in_id);
gpio_free(gpio_out_id);
...
}
userspace
====
printf("Setup this thread as a real time thread\n");
if((ret = mlockall(MCL_FUTURE|MCL_CURRENT)) < 0) {
printf("mlockall failed: %s\n", strerror(ret));
return -1;
}
sp.sched_priority = 98;
if((ret = pthread_setschedparam(pthread_self(), SCHED_FIFO, &sp)) != 0) {
printf("Failed to set lat thread to real-time priority: %s\n",
strerror(ret));
return -1;
}
...
use rt_gpio module to receive interrupt and ack.
Thank you,
L-C
^ permalink raw reply
* [PATCH net] net: netlink: cap max groups which will be considered in netlink_bind()
From: Nikolay Aleksandrov @ 2020-02-20 14:42 UTC (permalink / raw)
To: netdev
Cc: davem, Nikolay Aleksandrov, Christophe Leroy, Richard Guy Briggs,
Erhard F .
Since nl_groups is a u32 we can't bind more groups via ->bind
(netlink_bind) call, but netlink has supported more groups via
setsockopt() for a long time and thus nlk->ngroups could be over 32.
Recently I added support for per-vlan notifications and increased the
groups to 33 for NETLINK_ROUTE which exposed an old bug in the
netlink_bind() code causing out-of-bounds access on archs where unsigned
long is 32 bits via test_bit() on a local variable. Fix this by capping the
maximum groups in netlink_bind() to BITS_PER_TYPE(u32), effectively
capping them at 32 which is the minimum of allocated groups and the
maximum groups which can be bound via netlink_bind().
CC: Christophe Leroy <christophe.leroy@c-s.fr>
CC: Richard Guy Briggs <rgb@redhat.com>
Fixes: 4f520900522f ("netlink: have netlink per-protocol bind function return an error code.")
Reported-by: Erhard F. <erhard_f@mailbox.org>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
Dave it is not necessary to queue this fix for stable releases since
NETLINK_ROUTE is the first to reach more groups after I added the vlan
notification changes and I don't think we'll ever backport new groups. :)
Up to you of course.
In fact looking at netlink_kernel_create nlk->groups can't be less than 32
so we can add a NETLINK_MIN_GROUPS == NETLINK_MAX_LEGACY_BIND_GRPS == 32
in net-next to replace the raw value.
net/netlink/af_netlink.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 4e31721e7293..edf3e285e242 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1014,7 +1014,8 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr,
if (nlk->netlink_bind && groups) {
int group;
- for (group = 0; group < nlk->ngroups; group++) {
+ /* nl_groups is a u32, so cap the maximum groups we can bind */
+ for (group = 0; group < BITS_PER_TYPE(u32); group++) {
if (!test_bit(group, &groups))
continue;
err = nlk->netlink_bind(net, group + 1);
@@ -1033,7 +1034,7 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr,
netlink_insert(sk, nladdr->nl_pid) :
netlink_autobind(sock);
if (err) {
- netlink_undo_bind(nlk->ngroups, groups, sk);
+ netlink_undo_bind(BITS_PER_TYPE(u32), groups, sk);
goto unlock;
}
}
--
2.24.1
^ permalink raw reply related
* [dpdk-dev] [PATCH] net/mlx5: fix running without rxq
From: Shiri Kuzin @ 2020-02-20 14:42 UTC (permalink / raw)
To: dev@dpdk.org
Cc: Matan Azrad, Raslan Darawsheh, Slava Ovsiienko, stable@dpdk.org
When running mlx5_dev_start in mlx5_ethdev the function calls
mlx5_dev_configure_rss_reta in order to configure the rxq's.
Before mlx5_dev_configure_rss_reta there isn't a check whether
there are rxq's and if rxq's are 0 the function fails.
For example, this command:
/build/app/test-pmd/testpmd -n 4 -w 0000:08:00.0,rx_vec_en=0
-- --burst=64 --mbcache=512 -i --nb-cores=27 --txd=2048 --rxd=2048
--vxlan-gpe-port=6081 --mp-alloc=xbuf --rxq 0 --forward-mode=txonly
would fail.
In order to fix this issue, we should call mlx5_dev_configure_rss_reta
only if we have rxq's.
Fixes: 63bd16292c3a ("net/mlx5: support RSS on hairpin")
Cc: stable@dpdk.org
Signed-off-by: Shiri Kuzin <shirik@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
---
drivers/net/mlx5/mlx5_trigger.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index be47df5..571b7a0 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -280,11 +280,13 @@
rte_net_mlx5_dynf_inline_mask = 1UL << fine_inline;
else
rte_net_mlx5_dynf_inline_mask = 0;
- ret = mlx5_dev_configure_rss_reta(dev);
- if (ret) {
- DRV_LOG(ERR, "port %u reta config failed: %s",
- dev->data->port_id, strerror(rte_errno));
- return -rte_errno;
+ if (dev->data->nb_rx_queues > 0) {
+ ret = mlx5_dev_configure_rss_reta(dev);
+ if (ret) {
+ DRV_LOG(ERR, "port %u reta config failed: %s",
+ dev->data->port_id, strerror(rte_errno));
+ return -rte_errno;
+ }
}
ret = mlx5_txq_start(dev);
if (ret) {
--
1.8.3.1
^ permalink raw reply related
* Re: [PATCH v1 0/2] perf report: Support annotation of code without symbols
From: Jin, Yao @ 2020-02-20 14:42 UTC (permalink / raw)
To: Jiri Olsa
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
In-Reply-To: <20200220120655.GA586895@krava>
On 2/20/2020 8:06 PM, Jiri Olsa wrote:
> On Thu, Feb 20, 2020 at 08:03:18PM +0800, Jin, Yao wrote:
>>
>>
>> On 2/20/2020 7:56 PM, Jiri Olsa wrote:
>>> On Thu, Feb 20, 2020 at 08:59:00AM +0800, Jin Yao wrote:
>>>> For perf report on stripped binaries it is currently impossible to do
>>>> annotation. The annotation state is all tied to symbols, but there are
>>>> either no symbols, or symbols are not covering all the code.
>>>>
>>>> We should support the annotation functionality even without symbols.
>>>>
>>>> The first patch uses al_addr to print because it's easy to dump
>>>> the instructions from this address in binary for branch mode.
>>>>
>>>> The second patch supports the annotation on stripped binary.
>>>>
>>>> Jin Yao (2):
>>>> perf util: Print al_addr when symbol is not found
>>>> perf annotate: Support interactive annotation of code without symbols
>>>
>>> looks good, but I'm getting crash when annotating unresolved kernel address:
>>>
>>> jirka
>>>
>>>
>>
>> Thanks for reporting the issue.
>>
>> I guess you are trying the "0xffffffff81c00ae7", let me try to reproduce
>> this issue.
>
> yes, I also checked and it did not happen before
>
> jirka
>
Hi Jiri,
Can you try this fix?
diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index ff5711899234..5144528b2931 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -2497,7 +2497,7 @@ add_annotate_opt(struct hist_browser *browser,
struct map_symbol *ms,
u64 addr)
{
- if (ms->map->dso->annotate_warned)
+ if (!ms->map || !ms->map->dso || ms->map->dso->annotate_warned)
return 0;
if (!ms->sym) {
It's tested OK at my side.
Thanks
Jin Yao
^ permalink raw reply related
* Re: [PATCH] mm: memcontrol: asynchronous reclaim for memory.high
From: Johannes Weiner @ 2020-02-20 14:41 UTC (permalink / raw)
To: Michal Hocko
Cc: Andrew Morton, Tejun Heo, Roman Gushchin, linux-mm, cgroups,
linux-kernel, kernel-team
In-Reply-To: <20200220094639.GD20509@dhcp22.suse.cz>
On Thu, Feb 20, 2020 at 10:46:39AM +0100, Michal Hocko wrote:
> On Wed 19-02-20 16:17:35, Johannes Weiner wrote:
> > On Wed, Feb 19, 2020 at 08:53:32PM +0100, Michal Hocko wrote:
> > > On Wed 19-02-20 14:16:18, Johannes Weiner wrote:
> [...]
> > > > [ This is generally work in process: for example, if you isolate
> > > > workloads with memory.low, kswapd cpu time isn't accounted to the
> > > > cgroup that causes it. Swap IO issued by kswapd isn't accounted to
> > > > the group that is getting swapped.
> > >
> > > Well, kswapd is a system activity and as such it is acceptable that it
> > > is accounted to the system. But in this case we are talking about a
> > > memcg configuration which influences all other workloads by stealing CPU
> > > cycles from them
> >
> > From a user perspective this isn't a meaningful distinction.
> >
> > If I partition my memory among containers and one cgroup is acting
> > out, I would want the culprit to be charged for the cpu cycles the
> > reclaim is causing. Whether I divide my machine up using memory.low or
> > using memory.max doesn't really matter: I'm choosing between the two
> > based on a *memory policy* I want to implement - work-conserving vs
> > non-conserving. I shouldn't have to worry about the kernel tracking
> > CPU cycles properly in the respective implementations of these knobs.
> >
> > So kswapd is very much a cgroup-attributable activity, *especially* if
> > I'm using memory.low to delineate different memory domains.
>
> While I understand what you are saying I do not think this is easily
> achievable with the current implementation. The biggest problem I can
> see is that you do not have a clear information who to charge for
> the memory shortage on a particular NUMA node with a pure low limit
> based balancing because the limit is not NUMA aware. Besides that the
> origin of the memory pressure might be outside of any memcg. You can
> punish/account all memcgs in excess in some manner, e.g. proportionally
> to their size/excess but I am not really sure how fair that will
> be. Sounds like an interesting project but also sounds like tangent to
> this patch.
>
> High/Max limits are quite different because they are dealing with
> the internal memory pressure and you can attribute it to the
> cgroup/hierarchy which is in excess. There is a clear domain to reclaim
> from. This is an easier model to reason about IMHO.
They're not different. memory.low is just a usage limit that happens
to be enforcecd lazily rather than immediately.
If I'm setting memory.high or memory.max and I allocate beyond it, my
memory will be reclaimed with the limit as the target.
If I'm setting memory.low and I allocate beyond it, my memory will
eventually be reclaimed with the limit as the target.
In either case, the cgroup who allocated the memory that is being
reclaimed is the one obviously responsible for the reclaim work. Why
would the time of limit enforcement change that?
If on the other hand an allocation reclaims you below your limit, such
as can happen with a NUMA-bound allocation, whether it's high, max, or
low, then that's their cost to pay. But it's not really that important
what we do in that case - the memcg settings aren't NUMA aware so that
whole scenario is out of the purview of the controller anyway.
diff --git a/mm/vmscan.c b/mm/vmscan.c
index d6085115c7f2..24fe6e9e64b1 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2651,6 +2651,7 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
memcg = mem_cgroup_iter(target_memcg, NULL, NULL);
do {
struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat);
+ bool account_cpu = current_is_kswapd() || current_work();
unsigned long reclaimed;
unsigned long scanned;
@@ -2673,6 +2674,7 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
continue;
}
memcg_memory_event(memcg, MEMCG_LOW);
+ account_cpu = false;
break;
case MEMCG_PROT_NONE:
/*
@@ -2688,11 +2690,17 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
reclaimed = sc->nr_reclaimed;
scanned = sc->nr_scanned;
+ if (account_cpu)
+ use_cpu_of_cgroup(memcg->css.cgroup);
+
shrink_lruvec(lruvec, sc);
shrink_slab(sc->gfp_mask, pgdat->node_id, memcg,
sc->priority);
+ if (account_cpu)
+ unuse_cpu_of_cgroup();
+
/* Record the group's reclaim efficiency */
vmpressure(sc->gfp_mask, memcg, false,
sc->nr_scanned - scanned,
> > > without much throttling on the consumer side - especially when the
> > > memory is reclaimable without a lot of sleeping or contention on
> > > locks etc.
> >
> > The limiting factor on the consumer side is IO. Reading a page is way
> > more costly than reclaiming it, which is why we built our isolation
> > stack starting with memory and IO control and are only now getting to
> > working on proper CPU isolation.
> >
> > > I am absolutely aware that we will never achieve a perfect isolation due
> > > to all sorts of shared data structures, lock contention and what not but
> > > this patch alone just allows spill over to unaccounted work way too
> > > easily IMHO.
> >
> > I understand your concern about CPU cycles escaping, and I share
> > it. My point is that this patch isn't adding a problem that isn't
> > already there, nor is it that much of a practical concern at the time
> > of this writing given the state of CPU isolation in general.
>
> I beg to differ here. Ppu controller should be able to isolate user
> contexts performing high limit reclaim now. Your patch is changing that
> functionality to become unaccounted for a large part and that might be
> seen as a regression for those workloads which partition the system by
> using high limit and also rely on cpu controller because workloads are
> CPU sensitive.
>
> Without the CPU controller support this patch is not complete and I do
> not see an absolute must to marge it ASAP because it is not a regression
> fix or something we cannot live without.
I think you're still thinking in a cgroup1 reality, where you would
set a memory limit in isolation and then eat a ton of CPU pushing up
against it.
In comprehensive isolation setups implemented in cgroup2, "heavily"
reclaimed containers are primarily IO bound on page faults, refaults,
writeback. The reclaim cost is a small part of it, and as I said, in a
magnitude range for which the CPU controller is currently too heavy.
We can carry this patch out of tree until the CPU controller is fixed,
but I think the reasoning to keep it out is not actually based on the
practical reality of a cgroup2 world.
^ permalink raw reply related
* Re: [PATCH v7 04/24] mm: Move readahead nr_pages check into read_pages
From: Zi Yan @ 2020-02-20 14:36 UTC (permalink / raw)
To: Matthew Wilcox
Cc: linux-xfs, linux-kernel, linux-f2fs-devel, cluster-devel,
linux-mm, ocfs2-devel, linux-fsdevel, linux-ext4, linux-erofs,
linux-btrfs
In-Reply-To: <20200219210103.32400-5-willy@infradead.org>
[-- Attachment #1: Type: text/plain, Size: 1836 bytes --]
On 19 Feb 2020, at 16:00, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
>
> Simplify the callers by moving the check for nr_pages and the BUG_ON
> into read_pages().
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
> mm/readahead.c | 12 +++++++-----
> 1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/mm/readahead.c b/mm/readahead.c
> index 61b15b6b9e72..9fcd4e32b62d 100644
> --- a/mm/readahead.c
> +++ b/mm/readahead.c
> @@ -119,6 +119,9 @@ static void read_pages(struct address_space *mapping, struct file *filp,
> struct blk_plug plug;
> unsigned page_idx;
>
> + if (!nr_pages)
> + return;
> +
> blk_start_plug(&plug);
>
> if (mapping->a_ops->readpages) {
> @@ -138,6 +141,8 @@ static void read_pages(struct address_space *mapping, struct file *filp,
>
> out:
> blk_finish_plug(&plug);
> +
> + BUG_ON(!list_empty(pages));
> }
>
> /*
> @@ -180,8 +185,7 @@ void __do_page_cache_readahead(struct address_space *mapping,
> * contiguous pages before continuing with the next
> * batch.
> */
> - if (nr_pages)
> - read_pages(mapping, filp, &page_pool, nr_pages,
> + read_pages(mapping, filp, &page_pool, nr_pages,
> gfp_mask);
> nr_pages = 0;
> continue;
> @@ -202,9 +206,7 @@ void __do_page_cache_readahead(struct address_space *mapping,
> * uptodate then the caller will launch readpage again, and
> * will then handle the error.
> */
> - if (nr_pages)
> - read_pages(mapping, filp, &page_pool, nr_pages, gfp_mask);
> - BUG_ON(!list_empty(&page_pool));
> + read_pages(mapping, filp, &page_pool, nr_pages, gfp_mask);
> }
>
> /*
> --
> 2.25.0
Looks good to me. Thanks.
Reviewed-by: Zi Yan <ziy@nvidia.com>
--
Best Regards,
Yan Zi
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]
^ permalink raw reply
* Re: [PATCH] mm: memcontrol: asynchronous reclaim for memory.high
From: Johannes Weiner @ 2020-02-20 14:41 UTC (permalink / raw)
To: Michal Hocko
Cc: Andrew Morton, Tejun Heo, Roman Gushchin,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg
In-Reply-To: <20200220094639.GD20509-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
On Thu, Feb 20, 2020 at 10:46:39AM +0100, Michal Hocko wrote:
> On Wed 19-02-20 16:17:35, Johannes Weiner wrote:
> > On Wed, Feb 19, 2020 at 08:53:32PM +0100, Michal Hocko wrote:
> > > On Wed 19-02-20 14:16:18, Johannes Weiner wrote:
> [...]
> > > > [ This is generally work in process: for example, if you isolate
> > > > workloads with memory.low, kswapd cpu time isn't accounted to the
> > > > cgroup that causes it. Swap IO issued by kswapd isn't accounted to
> > > > the group that is getting swapped.
> > >
> > > Well, kswapd is a system activity and as such it is acceptable that it
> > > is accounted to the system. But in this case we are talking about a
> > > memcg configuration which influences all other workloads by stealing CPU
> > > cycles from them
> >
> > From a user perspective this isn't a meaningful distinction.
> >
> > If I partition my memory among containers and one cgroup is acting
> > out, I would want the culprit to be charged for the cpu cycles the
> > reclaim is causing. Whether I divide my machine up using memory.low or
> > using memory.max doesn't really matter: I'm choosing between the two
> > based on a *memory policy* I want to implement - work-conserving vs
> > non-conserving. I shouldn't have to worry about the kernel tracking
> > CPU cycles properly in the respective implementations of these knobs.
> >
> > So kswapd is very much a cgroup-attributable activity, *especially* if
> > I'm using memory.low to delineate different memory domains.
>
> While I understand what you are saying I do not think this is easily
> achievable with the current implementation. The biggest problem I can
> see is that you do not have a clear information who to charge for
> the memory shortage on a particular NUMA node with a pure low limit
> based balancing because the limit is not NUMA aware. Besides that the
> origin of the memory pressure might be outside of any memcg. You can
> punish/account all memcgs in excess in some manner, e.g. proportionally
> to their size/excess but I am not really sure how fair that will
> be. Sounds like an interesting project but also sounds like tangent to
> this patch.
>
> High/Max limits are quite different because they are dealing with
> the internal memory pressure and you can attribute it to the
> cgroup/hierarchy which is in excess. There is a clear domain to reclaim
> from. This is an easier model to reason about IMHO.
They're not different. memory.low is just a usage limit that happens
to be enforcecd lazily rather than immediately.
If I'm setting memory.high or memory.max and I allocate beyond it, my
memory will be reclaimed with the limit as the target.
If I'm setting memory.low and I allocate beyond it, my memory will
eventually be reclaimed with the limit as the target.
In either case, the cgroup who allocated the memory that is being
reclaimed is the one obviously responsible for the reclaim work. Why
would the time of limit enforcement change that?
If on the other hand an allocation reclaims you below your limit, such
as can happen with a NUMA-bound allocation, whether it's high, max, or
low, then that's their cost to pay. But it's not really that important
what we do in that case - the memcg settings aren't NUMA aware so that
whole scenario is out of the purview of the controller anyway.
diff --git a/mm/vmscan.c b/mm/vmscan.c
index d6085115c7f2..24fe6e9e64b1 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2651,6 +2651,7 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
memcg = mem_cgroup_iter(target_memcg, NULL, NULL);
do {
struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat);
+ bool account_cpu = current_is_kswapd() || current_work();
unsigned long reclaimed;
unsigned long scanned;
@@ -2673,6 +2674,7 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
continue;
}
memcg_memory_event(memcg, MEMCG_LOW);
+ account_cpu = false;
break;
case MEMCG_PROT_NONE:
/*
@@ -2688,11 +2690,17 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
reclaimed = sc->nr_reclaimed;
scanned = sc->nr_scanned;
+ if (account_cpu)
+ use_cpu_of_cgroup(memcg->css.cgroup);
+
shrink_lruvec(lruvec, sc);
shrink_slab(sc->gfp_mask, pgdat->node_id, memcg,
sc->priority);
+ if (account_cpu)
+ unuse_cpu_of_cgroup();
+
/* Record the group's reclaim efficiency */
vmpressure(sc->gfp_mask, memcg, false,
sc->nr_scanned - scanned,
> > > without much throttling on the consumer side - especially when the
> > > memory is reclaimable without a lot of sleeping or contention on
> > > locks etc.
> >
> > The limiting factor on the consumer side is IO. Reading a page is way
> > more costly than reclaiming it, which is why we built our isolation
> > stack starting with memory and IO control and are only now getting to
> > working on proper CPU isolation.
> >
> > > I am absolutely aware that we will never achieve a perfect isolation due
> > > to all sorts of shared data structures, lock contention and what not but
> > > this patch alone just allows spill over to unaccounted work way too
> > > easily IMHO.
> >
> > I understand your concern about CPU cycles escaping, and I share
> > it. My point is that this patch isn't adding a problem that isn't
> > already there, nor is it that much of a practical concern at the time
> > of this writing given the state of CPU isolation in general.
>
> I beg to differ here. Ppu controller should be able to isolate user
> contexts performing high limit reclaim now. Your patch is changing that
> functionality to become unaccounted for a large part and that might be
> seen as a regression for those workloads which partition the system by
> using high limit and also rely on cpu controller because workloads are
> CPU sensitive.
>
> Without the CPU controller support this patch is not complete and I do
> not see an absolute must to marge it ASAP because it is not a regression
> fix or something we cannot live without.
I think you're still thinking in a cgroup1 reality, where you would
set a memory limit in isolation and then eat a ton of CPU pushing up
against it.
In comprehensive isolation setups implemented in cgroup2, "heavily"
reclaimed containers are primarily IO bound on page faults, refaults,
writeback. The reclaim cost is a small part of it, and as I said, in a
magnitude range for which the CPU controller is currently too heavy.
We can carry this patch out of tree until the CPU controller is fixed,
but I think the reasoning to keep it out is not actually based on the
practical reality of a cgroup2 world.
^ permalink raw reply related
* Re: [PATCH] gpio: mockup: coding-style fix
From: Linus Walleij @ 2020-02-20 14:41 UTC (permalink / raw)
To: Bartosz Golaszewski
Cc: Kent Gibson, open list:GPIO SUBSYSTEM,
linux-kernel@vger.kernel.org, Bartosz Golaszewski
In-Reply-To: <20200210155059.29609-1-brgl@bgdev.pl>
On Mon, Feb 10, 2020 at 4:51 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote:
> From: Bartosz Golaszewski <bgolaszewski@baylibre.com>
>
> The indentation is wrong in gpio_mockup_apply_pull(). This patch makes
> the code more readable.
>
> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
> ---
> This fixes another indentation error introduced in v5.5 that I missed before.
Patch applied.
Yours,
Linus Walleij
^ permalink raw reply
* Re: w83627ehf crash in 5.6.0-rc2-00055-gca7e1fd1026c
From: Guenter Roeck @ 2020-02-20 14:40 UTC (permalink / raw)
To: Dr. David Alan Gilbert, Meelis Roos; +Cc: linux-hwmon, LKML, Chen Zhou
In-Reply-To: <20200220135709.GB18071@gallifrey>
On 2/20/20 5:57 AM, Dr. David Alan Gilbert wrote:
> * Meelis Roos (mroos@linux.ee) wrote:
>>> It looks like not all chips have temp_label, so I think we need to change w83627ehf_is_visible
>>> which has:
>>>
>>> if (attr == hwmon_temp_input || attr == hwmon_temp_label)
>>> return 0444;
>>>
>>> to
>>> if (attr == hwmon_temp_input)
>>> return 0444;
>>> if (attr == hwmon_temp_label) {
>>> if (data->temp_label)
>>> return 0444;
>>> else
>>> return 0;
Nitpick: else after return isn't necessary. Too bad I didn't notice that before;
static analyzers will have a field day :-)
>>> }
>>>
>>> Does that work for you?
>> Yes, it works - sensors are displayed as they should be, with nothing in dmesg.
>>
>> Thank you for so quick response!
>
> Great, I need to turn that into a proper patch; (I might need to wait till
> Saturday for that, although if someone needs it before then please shout).
>
We'll want this fixed in the next stable release candidate, so I wrote one up
and submitted it.
Thanks,
Guenter
^ permalink raw reply
* RE: [PATCH v3 0/3] Introduce per-task latency_nice for scheduler hints
From: David Laight @ 2020-02-20 14:39 UTC (permalink / raw)
To: 'chris hyser', Parth Shah, vincent.guittot@linaro.org,
patrick.bellasi@matbug.net, valentin.schneider@arm.com,
dhaval.giani@oracle.com, dietmar.eggemann@arm.com
Cc: linux-kernel@vger.kernel.org, peterz@infradead.org,
mingo@redhat.com, qais.yousef@arm.com, pavel@ucw.cz,
qperret@qperret.net, pjt@google.com, tj@kernel.org
In-Reply-To: <10f42efa-3750-491a-74fe-d84c9c4924e3@oracle.com>
From: chris hyser <chris.hyser@oracle.com>
> Sent: 19 February 2020 17:17
>
> On 2/19/20 6:18 AM, David Laight wrote:
> > From: chris hyser
> >> Sent: 18 February 2020 23:00
> > ...
> >> All, I was asked to take a look at the original latency_nice patchset.
> >> First, to clarify objectives, Oracle is not
> >> interested in trading throughput for latency.
> >> What we found is that the DB has specific tasks which do very little but
> >> need to do this as absolutely quickly as possible, ie extreme latency
> >> sensitivity. Second, the key to latency reduction
> >> in the task wakeup path seems to be limiting variations of "idle cpu" search.
> >> The latter particularly interests me as an example of "platform size
> >> based latency" which I believe to be important given all the varying size
> >> VMs and containers.
> >
> > From my experiments there are a few things that seem to affect latency
> > of waking up real time (sched fifo) tasks on a normal kernel:
>
> Sorry. I was only ever talking about sched_other as per the original patchset. I realize the term
> extreme latency
> sensitivity may have caused confusion. What that means to DB people is no doubt different than audio
> people. :-)
Shorter lines.....
ISTM you are making some already complicated code even more complex.
Better to make it simpler instead.
If you need a thread to run as soon as possible after it is woken
why not use the RT scheduler (eg SCHED_FIFO) that is what it is for.
If there are delays finding an idle cpu to migrate a process to
(especially on systems with large numbers of cpu) then that is a
general problem that can be addressed without extra knobs.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply
* Re: [Xen-devel] [PATCH] rwlock: allow recursive read locking when already locked in write mode
From: Roger Pau Monné @ 2020-02-20 14:38 UTC (permalink / raw)
To: Jürgen Groß
Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
George Dunlap, Andrew Cooper, Ian Jackson, Jan Beulich, xen-devel
In-Reply-To: <ac515c56-e391-3636-244d-4b660c615dee@suse.com>
On Thu, Feb 20, 2020 at 03:23:38PM +0100, Jürgen Groß wrote:
> On 20.02.20 15:11, Roger Pau Monné wrote:
> > On Thu, Feb 20, 2020 at 01:48:54PM +0100, Jan Beulich wrote:
> > > On 20.02.2020 13:02, Roger Pau Monne wrote:
> > > > I've done some testing and at least the CPU down case is fixed now.
> > > > Posting early in order to get feedback on the approach taken.
> > >
> > > Looks good, thanks, just a question and two comments:
> > >
> > > > --- a/xen/include/xen/rwlock.h
> > > > +++ b/xen/include/xen/rwlock.h
> > > > @@ -20,21 +20,30 @@ typedef struct {
> > > > #define DEFINE_RWLOCK(l) rwlock_t l = RW_LOCK_UNLOCKED
> > > > #define rwlock_init(l) (*(l) = (rwlock_t)RW_LOCK_UNLOCKED)
> > > > -/*
> > > > - * Writer states & reader shift and bias.
> > > > - *
> > > > - * Writer field is 8 bit to allow for potential optimisation, see
> > > > - * _write_unlock().
> > > > - */
> > > > -#define _QW_WAITING 1 /* A writer is waiting */
> > > > -#define _QW_LOCKED 0xff /* A writer holds the lock */
> > > > -#define _QW_WMASK 0xff /* Writer mask.*/
> > > > -#define _QR_SHIFT 8 /* Reader count shift */
> > > > +/* Writer states & reader shift and bias. */
> > > > +#define _QW_WAITING 1 /* A writer is waiting */
> > > > +#define _QW_LOCKED 3 /* A writer holds the lock */
> > >
> > > Aiui things would work equally well if 2 was used here?
> >
> > I think so, I left it as 3 because previously LOCKED would also
> > include WAITING, and I didn't want to change it in case I've missed
> > some code path that was relying on that.
> >
> > >
> > > > +#define _QW_WMASK 3 /* Writer mask */
> > > > +#define _QW_CPUSHIFT 2 /* Writer CPU shift */
> > > > +#define _QW_CPUMASK 0x3ffc /* Writer CPU mask */
> > >
> > > At least on x86, the shift involved here is quite certainly
> > > more expensive than using wider immediates on AND and CMP
> > > resulting from the _QW_MASK-based comparisons. I'd therefore
> > > like to suggest to put the CPU in the low 12 bits.
> >
> > Hm right. The LOCKED and WAITING bits don't need shifting anyway.
> >
> > >
> > > Another option is to use the recurse_cpu field of the
> > > associated spin lock: The field is used for recursive locks
> > > only, and hence the only conflict would be with
> > > _spin_is_locked(), which we don't (and in the future then
> > > also shouldn't) use on this lock.
> >
> > I looked into that also, but things get more complicated AFAICT, as it's
> > not possible to atomically fetch the state of the lock and the owner
> > CPU at the same time. Neither you could set the LOCKED bit and the CPU
> > at the same time.
> >
> > >
> > > > @@ -166,7 +180,8 @@ static inline void _write_unlock(rwlock_t *lock)
> > > > * If the writer field is atomic, it can be cleared directly.
> > > > * Otherwise, an atomic subtraction will be used to clear it.
> > > > */
> > > > - atomic_sub(_QW_LOCKED, &lock->cnts);
> > > > + ASSERT(_is_write_locked_by_me(atomic_read(&lock->cnts)));
> > > > + atomic_sub(_write_lock_val(), &lock->cnts);
> > >
> > > I think this would be more efficient with atomic_and(), not
> > > the least because of the then avoided smp_processor_id().
> > > Whether to mask off just _QW_WMASK or also the CPU number of
> > > the last write owner would need to be determined. But with
> > > using subtraction, in case of problems it'll likely be
> > > harder to understand what actually went on, from looking at
> > > the resulting state of the lock (this is in part a pre-
> > > existing problem, but gets worse with subtraction of CPU
> > > numbers).
> >
> > Right, a mask would be better. Right now both need to be cleared (the
> > LOCKED and the CPU fields) as there's code that relies on !lock->cnts
> > as a way to determine that the lock is not read or write locked. If we
> > left the CPU lying around those checks would need to be adjusted.
>
> In case you make _QR_SHIFT 16 it is possible to just write a 2-byte zero
> value for write_unlock() (like its possible to do so today using a
> single byte write).
That would limit the readers count to 65536, what do you think Jan?
Thanks, Roger.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply
* [LTP] [PATCH] syscalls/ptrace: Merge ptrace01 and ptrace02
From: Cyril Hrubis @ 2020-02-20 14:39 UTC (permalink / raw)
To: ltp
In-Reply-To: <20200108135949.15048-1-jcronenberg@suse.de>
Hi!
Pushed with minor adjustements, thanks.
* Removed the runtest entry for ptrace02
* Removed the .gitignore entry for ptrace02
* Fixed the status handling at the end of the test
We should adjust our expectations based on which ptrace() call we did,
e.g. for PTRACE_CONT the child killed by sigkill is still a failure.
Also you have to call WIF*() first and only then you can get the
exit/term signal value.
--
Cyril Hrubis
chrubis@suse.cz
^ permalink raw reply
* [PATCH] fstests: add a another gap extent testcase for btrfs
From: Josef Bacik @ 2020-02-20 14:38 UTC (permalink / raw)
To: linux-btrfs, kernel-team, fstests
This is a testcase for a corner that I missed when trying to fix gap
extents for btrfs. We would end up with gaps if we hole punched past
isize and then extended past the gap in a specific way. This is a
simple reproducer to show the problem, and has been properly fixed by my
patches now.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
tests/btrfs/204 | 85 +++++++++++++++++++++++++++++++++++++++++++++
tests/btrfs/204.out | 5 +++
tests/btrfs/group | 1 +
3 files changed, 91 insertions(+)
create mode 100755 tests/btrfs/204
create mode 100644 tests/btrfs/204.out
diff --git a/tests/btrfs/204 b/tests/btrfs/204
new file mode 100755
index 00000000..0d5c4bed
--- /dev/null
+++ b/tests/btrfs/204
@@ -0,0 +1,85 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2020 Facebook. All Rights Reserved.
+#
+# FS QA Test 204
+#
+# Validate that without no-holes we do not get a i_size that is after a gap in
+# the file extents on disk when punching a hole past i_size. This is fixed by
+# the following patches
+#
+# btrfs: use the file extent tree infrastructure
+# btrfs: replace all uses of btrfs_ordered_update_i_size
+#
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1 # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+ cd /
+ rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/dmlogwrites
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs generic
+_supported_os Linux
+_require_test
+_require_scratch
+_require_log_writes
+
+_log_writes_init $SCRATCH_DEV
+_log_writes_mkfs "-O ^no-holes" >> $seqres.full 2>&1
+
+# There's not a straightforward way to commit the transaction without also
+# flushing dirty pages, so shorten the commit interval to 1 so we're sure to get
+# a commit with our broken file
+_log_writes_mount -o commit=1
+
+# This creates a gap extent because fpunch doesn't insert hole extents past
+# i_size
+xfs_io -f -c "falloc -k 4k 8k" $SCRATCH_MNT/file
+xfs_io -f -c "fpunch 4k 4k" $SCRATCH_MNT/file
+
+# The pwrite extends the i_size to cover the gap extent, and then the truncate
+# sets the disk_i_size to 12k because it assumes everything was a-ok.
+xfs_io -f -c "pwrite 0 4k" $SCRATCH_MNT/file | _filter_xfs_io
+xfs_io -f -c "pwrite 0 8k" $SCRATCH_MNT/file | _filter_xfs_io
+xfs_io -f -c "truncate 12k" $SCRATCH_MNT/file
+
+# Wait for a transaction commit
+sleep 2
+
+_log_writes_unmount
+_log_writes_remove
+
+cur=$(_log_writes_find_next_fua 0)
+echo "cur=$cur" >> $seqres.full
+while [ ! -z "$cur" ]; do
+ _log_writes_replay_log_range $cur $SCRATCH_DEV >> $seqres.full
+
+ # We only care about the fs consistency, so just run fsck, we don't have
+ # to mount the fs to validate it
+ _check_scratch_fs
+
+ cur=$(_log_writes_find_next_fua $(($cur + 1)))
+done
+
+# success, all done
+status=0
+exit
diff --git a/tests/btrfs/204.out b/tests/btrfs/204.out
new file mode 100644
index 00000000..44c7c8ae
--- /dev/null
+++ b/tests/btrfs/204.out
@@ -0,0 +1,5 @@
+QA output created by 204
+wrote 4096/4096 bytes at offset 0
+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 8192/8192 bytes at offset 0
+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
diff --git a/tests/btrfs/group b/tests/btrfs/group
index 6acc6426..7a840177 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -206,3 +206,4 @@
201 auto quick punch log
202 auto quick subvol snapshot
203 auto quick send clone
+204 auto quick log replay
--
2.24.1
^ permalink raw reply related
* Re: [RFC PATCH 0/5] Removing support for 32bit KVM/arm host
From: Robin Murphy @ 2020-02-20 14:38 UTC (permalink / raw)
To: Marc Zyngier
Cc: Vladimir Murzin, Russell King, kvm, Arnd Bergmann,
Suzuki K Poulose, Quentin Perret, Christoffer Dall,
Krzysztof Kozlowski, Bartlomiej Zolnierkiewicz, James Morse,
linux-arm-kernel, Paolo Bonzini, Will Deacon, kvmarm,
Julien Thierry, Marek Szyprowski
In-Reply-To: <3f7f3b6c8b758b6d2134364616c6bc1e@kernel.org>
On 20/02/2020 2:01 pm, Marc Zyngier wrote:
> On 2020-02-20 13:32, Robin Murphy wrote:
>> On 20/02/2020 1:15 pm, Marc Zyngier wrote:
>>> Hi Marek,
>>>
>>> On 2020-02-20 12:44, Marek Szyprowski wrote:
>>>> Hi Marc,
>>>>
>>>> On 10.02.2020 15:13, Marc Zyngier wrote:
>>>>> KVM/arm was merged just over 7 years ago, and has lived a very quiet
>>>>> life so far. It mostly works if you're prepared to deal with its
>>>>> limitations, it has been a good prototype for the arm64 version,
>>>>> but it suffers a few problems:
>>>>>
>>>>> - It is incomplete (no debug support, no PMU)
>>>>> - It hasn't followed any of the architectural evolutions
>>>>> - It has zero users (I don't count myself here)
>>>>> - It is more and more getting in the way of new arm64 developments
>>>>
>>>> That is a bit sad information. Mainline Exynos finally got everything
>>>> that was needed to run it on the quite popular Samsung Exynos5422-based
>>>> Odroid XU4/HC1/MC1 boards. According to the Odroid related forums it is
>>>> being used. We also use it internally at Samsung.
>>>
>>> Something like "too little, too late" springs to mind, but let's be
>>> constructive. Is anyone using it in a production environment, where
>>> they rely on the latest mainline kernel having KVM support?
>>>
>>> The current proposal is to still have KVM support in 5.6, as well as
>>> ongoing support for stable kernels. If that's not enough, can you please
>>> explain your precise use case?
>>
>> Presumably there's no *technical* reason why the stable subset of v7
>> support couldn't be stripped down and brought back private to arch/arm
>> if somebody really wants and is willing to step up and look after it?
>
> There is no technical reason at all, just a maintenance effort.
>
> The main killer is the whole MMU code, which I'm butchering with NV,
> and that I suspect Will will also turn upside down with his stuff.
> Not to mention the hypercall interface that will need a complete overhaul.
>
> If we wanted to decouple the two, we'd need to make the MMU code, the
> hypercalls, arm.c and a number of other bits private to 32bit.
Right, the prospective kvm-arm maintainer's gameplan would essentially
be an equivalent "move virt/kvm/arm to arch/arm/kvm" patch, but then
ripping out all the Armv8 and GICv3 gubbins instead. Yes, there would
then be lots of *similar* code to start with, but it would only diverge
further as v8 architecture development continues independently.
Anyway, I just thought it seemed worth saying out loud, to reassure
folks that a realistic middle-ground between "yay bye!" and "oh no the
end of the world!" does exist, namely "someone else's problem" :)
Robin.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply
* Re: [RFC PATCH 0/5] Removing support for 32bit KVM/arm host
From: Robin Murphy @ 2020-02-20 14:38 UTC (permalink / raw)
To: Marc Zyngier
Cc: Russell King, kvm, Arnd Bergmann, Krzysztof Kozlowski,
Bartlomiej Zolnierkiewicz, linux-arm-kernel, Paolo Bonzini,
Will Deacon, kvmarm, Marek Szyprowski
In-Reply-To: <3f7f3b6c8b758b6d2134364616c6bc1e@kernel.org>
On 20/02/2020 2:01 pm, Marc Zyngier wrote:
> On 2020-02-20 13:32, Robin Murphy wrote:
>> On 20/02/2020 1:15 pm, Marc Zyngier wrote:
>>> Hi Marek,
>>>
>>> On 2020-02-20 12:44, Marek Szyprowski wrote:
>>>> Hi Marc,
>>>>
>>>> On 10.02.2020 15:13, Marc Zyngier wrote:
>>>>> KVM/arm was merged just over 7 years ago, and has lived a very quiet
>>>>> life so far. It mostly works if you're prepared to deal with its
>>>>> limitations, it has been a good prototype for the arm64 version,
>>>>> but it suffers a few problems:
>>>>>
>>>>> - It is incomplete (no debug support, no PMU)
>>>>> - It hasn't followed any of the architectural evolutions
>>>>> - It has zero users (I don't count myself here)
>>>>> - It is more and more getting in the way of new arm64 developments
>>>>
>>>> That is a bit sad information. Mainline Exynos finally got everything
>>>> that was needed to run it on the quite popular Samsung Exynos5422-based
>>>> Odroid XU4/HC1/MC1 boards. According to the Odroid related forums it is
>>>> being used. We also use it internally at Samsung.
>>>
>>> Something like "too little, too late" springs to mind, but let's be
>>> constructive. Is anyone using it in a production environment, where
>>> they rely on the latest mainline kernel having KVM support?
>>>
>>> The current proposal is to still have KVM support in 5.6, as well as
>>> ongoing support for stable kernels. If that's not enough, can you please
>>> explain your precise use case?
>>
>> Presumably there's no *technical* reason why the stable subset of v7
>> support couldn't be stripped down and brought back private to arch/arm
>> if somebody really wants and is willing to step up and look after it?
>
> There is no technical reason at all, just a maintenance effort.
>
> The main killer is the whole MMU code, which I'm butchering with NV,
> and that I suspect Will will also turn upside down with his stuff.
> Not to mention the hypercall interface that will need a complete overhaul.
>
> If we wanted to decouple the two, we'd need to make the MMU code, the
> hypercalls, arm.c and a number of other bits private to 32bit.
Right, the prospective kvm-arm maintainer's gameplan would essentially
be an equivalent "move virt/kvm/arm to arch/arm/kvm" patch, but then
ripping out all the Armv8 and GICv3 gubbins instead. Yes, there would
then be lots of *similar* code to start with, but it would only diverge
further as v8 architecture development continues independently.
Anyway, I just thought it seemed worth saying out loud, to reassure
folks that a realistic middle-ground between "yay bye!" and "oh no the
end of the world!" does exist, namely "someone else's problem" :)
Robin.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
^ permalink raw reply
* Re: [RFC PATCH 0/5] Removing support for 32bit KVM/arm host
From: Robin Murphy @ 2020-02-20 14:38 UTC (permalink / raw)
To: Marc Zyngier
Cc: Marek Szyprowski, Vladimir Murzin, Russell King, kvm,
Arnd Bergmann, Suzuki K Poulose, Quentin Perret, Christoffer Dall,
Krzysztof Kozlowski, Bartlomiej Zolnierkiewicz, James Morse,
Julien Thierry, Paolo Bonzini, Will Deacon, kvmarm,
linux-arm-kernel
In-Reply-To: <3f7f3b6c8b758b6d2134364616c6bc1e@kernel.org>
On 20/02/2020 2:01 pm, Marc Zyngier wrote:
> On 2020-02-20 13:32, Robin Murphy wrote:
>> On 20/02/2020 1:15 pm, Marc Zyngier wrote:
>>> Hi Marek,
>>>
>>> On 2020-02-20 12:44, Marek Szyprowski wrote:
>>>> Hi Marc,
>>>>
>>>> On 10.02.2020 15:13, Marc Zyngier wrote:
>>>>> KVM/arm was merged just over 7 years ago, and has lived a very quiet
>>>>> life so far. It mostly works if you're prepared to deal with its
>>>>> limitations, it has been a good prototype for the arm64 version,
>>>>> but it suffers a few problems:
>>>>>
>>>>> - It is incomplete (no debug support, no PMU)
>>>>> - It hasn't followed any of the architectural evolutions
>>>>> - It has zero users (I don't count myself here)
>>>>> - It is more and more getting in the way of new arm64 developments
>>>>
>>>> That is a bit sad information. Mainline Exynos finally got everything
>>>> that was needed to run it on the quite popular Samsung Exynos5422-based
>>>> Odroid XU4/HC1/MC1 boards. According to the Odroid related forums it is
>>>> being used. We also use it internally at Samsung.
>>>
>>> Something like "too little, too late" springs to mind, but let's be
>>> constructive. Is anyone using it in a production environment, where
>>> they rely on the latest mainline kernel having KVM support?
>>>
>>> The current proposal is to still have KVM support in 5.6, as well as
>>> ongoing support for stable kernels. If that's not enough, can you please
>>> explain your precise use case?
>>
>> Presumably there's no *technical* reason why the stable subset of v7
>> support couldn't be stripped down and brought back private to arch/arm
>> if somebody really wants and is willing to step up and look after it?
>
> There is no technical reason at all, just a maintenance effort.
>
> The main killer is the whole MMU code, which I'm butchering with NV,
> and that I suspect Will will also turn upside down with his stuff.
> Not to mention the hypercall interface that will need a complete overhaul.
>
> If we wanted to decouple the two, we'd need to make the MMU code, the
> hypercalls, arm.c and a number of other bits private to 32bit.
Right, the prospective kvm-arm maintainer's gameplan would essentially
be an equivalent "move virt/kvm/arm to arch/arm/kvm" patch, but then
ripping out all the Armv8 and GICv3 gubbins instead. Yes, there would
then be lots of *similar* code to start with, but it would only diverge
further as v8 architecture development continues independently.
Anyway, I just thought it seemed worth saying out loud, to reassure
folks that a realistic middle-ground between "yay bye!" and "oh no the
end of the world!" does exist, namely "someone else's problem" :)
Robin.
^ permalink raw reply
* [PATCH] hwmon: (w83627ehf) Fix crash seen with W83627DHG-P
From: Guenter Roeck @ 2020-02-20 14:37 UTC (permalink / raw)
To: Hardware Monitoring
Cc: Jean Delvare, Guenter Roeck, Meelis Roos, Dr . David Alan Gilbert
Loading the driver on a system with W83627DHG-P crashes as follows.
w83627ehf: Found W83627DHG-P chip at 0x290
BUG: kernel NULL pointer dereference, address: 0000000000000000
PGD 0 P4D 0
Oops: 0000 [#1] SMP NOPTI
CPU: 0 PID: 604 Comm: sensors Not tainted 5.6.0-rc2-00055-gca7e1fd1026c #29
Hardware name: /D425KT, BIOS MWPNT10N.86A.0132.2013.0726.1534 07/26/2013
RIP: 0010:w83627ehf_read_string+0x27/0x70 [w83627ehf]
Code: [... ]
RSP: 0018:ffffb95980657df8 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff96caaa7f5218 RCX: 0000000000000000
RDX: 0000000000000015 RSI: 0000000000000001 RDI: ffff96caa736ec08
RBP: 0000000000000000 R08: ffffb95980657e20 R09: 0000000000000001
R10: ffff96caaa635cc0 R11: 0000000000000000 R12: ffff96caa9f7cf00
R13: ffff96caa9ec3d00 R14: ffff96caa9ec3d28 R15: ffff96caa9ec3d40
FS: 00007fbc7c4e2740(0000) GS:ffff96caabc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000129d58000 CR4: 00000000000006f0
Call Trace:
? cp_new_stat+0x12d/0x160
hwmon_attr_show_string+0x37/0x70 [hwmon]
dev_attr_show+0x14/0x50
sysfs_kf_seq_show+0xb5/0x1b0
seq_read+0xcf/0x460
vfs_read+0x9b/0x150
ksys_read+0x5f/0xe0
do_syscall_64+0x48/0x190
entry_SYSCALL_64_after_hwframe+0x44/0xa9
...
Temperature labels are not always present. Adjust sysfs attribute
visibility accordingly.
Reported-by: Meelis Roos <mroos@linux.ee>
Cc: Meelis Roos <mroos@linux.ee>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Fixes: 266cd5835947 ("hwmon: (w83627ehf) convert to with_info interface")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
drivers/hwmon/w83627ehf.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/hwmon/w83627ehf.c b/drivers/hwmon/w83627ehf.c
index 7ffadc2da57b..5a5120121e50 100644
--- a/drivers/hwmon/w83627ehf.c
+++ b/drivers/hwmon/w83627ehf.c
@@ -1346,8 +1346,13 @@ w83627ehf_is_visible(const void *drvdata, enum hwmon_sensor_types type,
/* channel 0.., name 1.. */
if (!(data->have_temp & (1 << channel)))
return 0;
- if (attr == hwmon_temp_input || attr == hwmon_temp_label)
+ if (attr == hwmon_temp_input)
return 0444;
+ if (attr == hwmon_temp_label) {
+ if (data->temp_label)
+ return 0444;
+ return 0;
+ }
if (channel == 2 && data->temp3_val_only)
return 0;
if (attr == hwmon_temp_max) {
--
2.17.1
^ permalink raw reply related
* Re: [PATCH v3 03/10] ASoC: tegra: add Tegra210 based DMIC driver
From: Jon Hunter @ 2020-02-20 14:36 UTC (permalink / raw)
To: Sameer Pujar, perex, tiwai, robh+dt
Cc: devicetree, alsa-devel, atalambedu, lgirdwood, linux-kernel,
viswanathl, sharadg, broonie, thierry.reding, linux-tegra, digetx,
rlokhande, mkumard, dramesh
In-Reply-To: <1582180492-25297-4-git-send-email-spujar@nvidia.com>
On 20/02/2020 06:34, Sameer Pujar wrote:
> The Digital MIC (DMIC) Controller is used to interface with Pulse Density
> Modulation (PDM) input devices. The DMIC controller implements a converter
> to convert PDM signals to Pulse Code Modulation (PCM) signals. From signal
> flow perspective, the DMIC can be viewed as a PDM receiver.
>
> This patch registers DMIC component with ASoC framework. The component
> driver exposes DAPM widgets, routes and kcontrols for the device. The DAI
> driver exposes DMIC interfaces, which can be used to connect different
> components in the ASoC layer. Makefile and Kconfig support is added to
> allow to build the driver. The DMIC devices can be enabled in the DT via
> "nvidia,tegra210-dmic" compatible string. This driver can be used for
> Tegra186 and Tegra194 chips as well.
>
> Signed-off-by: Sameer Pujar <spujar@nvidia.com>
Thanks!
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Cheers
Jon
--
nvpublic
^ permalink raw reply
* Re: [dpdk-dev] [PATCH 1/4] ci: remove unnecessary dependency on Linux headers
From: Aaron Conole @ 2020-02-20 14:37 UTC (permalink / raw)
To: David Marchand; +Cc: thomas, dev, Michael Santana
In-Reply-To: <20200219194131.29417-2-david.marchand@redhat.com>
David Marchand <david.marchand@redhat.com> writes:
> Following removal of kmod compilation, we don't need to install
> linux-headers anymore.
>
> Fixes: ea860973592b ("ci: remove redundant configs disabling kmods")
>
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---
Acked-by: Aaron Conole <aconole@redhat.com>
^ permalink raw reply
* Re: [dpdk-dev] [PATCH 2/4] ci: fix Travis config warnings
From: Aaron Conole @ 2020-02-20 14:36 UTC (permalink / raw)
To: David Marchand; +Cc: thomas, dev, Michael Santana
In-Reply-To: <20200219194131.29417-3-david.marchand@redhat.com>
David Marchand <david.marchand@redhat.com> writes:
> Reading https://config.travis-ci.com/ and using
> https://config.travis-ci.com/explore to check changes, we can cleanup
> some warnings reported by the config validation options in Travis.
>
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---
Acked-by: Aaron Conole <aconole@redhat.com>
Echoing what Thomas wrote, please include the issues warnings.
^ permalink raw reply
* Re: [PATCH v3 03/10] ASoC: tegra: add Tegra210 based DMIC driver
From: Jon Hunter @ 2020-02-20 14:36 UTC (permalink / raw)
To: Sameer Pujar, perex, tiwai, robh+dt
Cc: broonie, lgirdwood, thierry.reding, digetx, alsa-devel,
devicetree, linux-tegra, linux-kernel, sharadg, mkumard,
viswanathl, rlokhande, dramesh, atalambedu
In-Reply-To: <1582180492-25297-4-git-send-email-spujar@nvidia.com>
On 20/02/2020 06:34, Sameer Pujar wrote:
> The Digital MIC (DMIC) Controller is used to interface with Pulse Density
> Modulation (PDM) input devices. The DMIC controller implements a converter
> to convert PDM signals to Pulse Code Modulation (PCM) signals. From signal
> flow perspective, the DMIC can be viewed as a PDM receiver.
>
> This patch registers DMIC component with ASoC framework. The component
> driver exposes DAPM widgets, routes and kcontrols for the device. The DAI
> driver exposes DMIC interfaces, which can be used to connect different
> components in the ASoC layer. Makefile and Kconfig support is added to
> allow to build the driver. The DMIC devices can be enabled in the DT via
> "nvidia,tegra210-dmic" compatible string. This driver can be used for
> Tegra186 and Tegra194 chips as well.
>
> Signed-off-by: Sameer Pujar <spujar@nvidia.com>
Thanks!
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Cheers
Jon
--
nvpublic
^ permalink raw reply
* Re: [PATCH v7 04/24] mm: Move readahead nr_pages check into read_pages
From: Zi Yan @ 2020-02-20 14:36 UTC (permalink / raw)
To: Matthew Wilcox
Cc: linux-fsdevel, linux-mm, linux-kernel, linux-btrfs, linux-erofs,
linux-ext4, linux-f2fs-devel, cluster-devel, ocfs2-devel,
linux-xfs
In-Reply-To: <20200219210103.32400-5-willy@infradead.org>
[-- Attachment #1: Type: text/plain, Size: 1836 bytes --]
On 19 Feb 2020, at 16:00, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
>
> Simplify the callers by moving the check for nr_pages and the BUG_ON
> into read_pages().
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
> mm/readahead.c | 12 +++++++-----
> 1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/mm/readahead.c b/mm/readahead.c
> index 61b15b6b9e72..9fcd4e32b62d 100644
> --- a/mm/readahead.c
> +++ b/mm/readahead.c
> @@ -119,6 +119,9 @@ static void read_pages(struct address_space *mapping, struct file *filp,
> struct blk_plug plug;
> unsigned page_idx;
>
> + if (!nr_pages)
> + return;
> +
> blk_start_plug(&plug);
>
> if (mapping->a_ops->readpages) {
> @@ -138,6 +141,8 @@ static void read_pages(struct address_space *mapping, struct file *filp,
>
> out:
> blk_finish_plug(&plug);
> +
> + BUG_ON(!list_empty(pages));
> }
>
> /*
> @@ -180,8 +185,7 @@ void __do_page_cache_readahead(struct address_space *mapping,
> * contiguous pages before continuing with the next
> * batch.
> */
> - if (nr_pages)
> - read_pages(mapping, filp, &page_pool, nr_pages,
> + read_pages(mapping, filp, &page_pool, nr_pages,
> gfp_mask);
> nr_pages = 0;
> continue;
> @@ -202,9 +206,7 @@ void __do_page_cache_readahead(struct address_space *mapping,
> * uptodate then the caller will launch readpage again, and
> * will then handle the error.
> */
> - if (nr_pages)
> - read_pages(mapping, filp, &page_pool, nr_pages, gfp_mask);
> - BUG_ON(!list_empty(&page_pool));
> + read_pages(mapping, filp, &page_pool, nr_pages, gfp_mask);
> }
>
> /*
> --
> 2.25.0
Looks good to me. Thanks.
Reviewed-by: Zi Yan <ziy@nvidia.com>
--
Best Regards,
Yan Zi
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]
^ permalink raw reply
* [PATCH] trace-cmd: Add output to show "make test"
From: Steven Rostedt @ 2020-02-20 14:36 UTC (permalink / raw)
To: Linux Trace Devel
From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
The unit tests are made with "make test". In order to let developers know of
this option, have it displayed when just "make" is used.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
Makefile | 1 +
1 file changed, 1 insertion(+)
diff --git a/Makefile b/Makefile
index 60cd1570..a3facaa9 100644
--- a/Makefile
+++ b/Makefile
@@ -364,6 +364,7 @@ $(obj)/lib/traceevent/plugins/trace_python_dir: force
show_gui_make:
@echo "Note: to build the gui, type \"make gui\""
@echo " to build man pages, type \"make doc\""
+ @echo " to build unit tests, type \"make test\""
PHONY += show_gui_make
--
2.20.1
^ permalink raw reply related
* Re: [PATCH v3 03/10] ASoC: tegra: add Tegra210 based DMIC driver
From: Jon Hunter @ 2020-02-20 14:36 UTC (permalink / raw)
To: Sameer Pujar, perex-/Fr2/VpizcU, tiwai-IBi9RG/b67k,
robh+dt-DgEjT+Ai2ygdnm+yROfE0A
Cc: broonie-DgEjT+Ai2ygdnm+yROfE0A, lgirdwood-Re5JQEeQqe8AvxtiuMwx3w,
thierry.reding-Re5JQEeQqe8AvxtiuMwx3w,
digetx-Re5JQEeQqe8AvxtiuMwx3w, alsa-devel-K7yf7f+aM1XWsZ/bQMPhNw,
devicetree-u79uwXL29TY76Z2rM5mHXA,
linux-tegra-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
sharadg-DDmLM1+adcrQT0dZR+AlfA, mkumard-DDmLM1+adcrQT0dZR+AlfA,
viswanathl-DDmLM1+adcrQT0dZR+AlfA,
rlokhande-DDmLM1+adcrQT0dZR+AlfA, dramesh-DDmLM1+adcrQT0dZR+AlfA,
atalambedu-DDmLM1+adcrQT0dZR+AlfA
In-Reply-To: <1582180492-25297-4-git-send-email-spujar-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
On 20/02/2020 06:34, Sameer Pujar wrote:
> The Digital MIC (DMIC) Controller is used to interface with Pulse Density
> Modulation (PDM) input devices. The DMIC controller implements a converter
> to convert PDM signals to Pulse Code Modulation (PCM) signals. From signal
> flow perspective, the DMIC can be viewed as a PDM receiver.
>
> This patch registers DMIC component with ASoC framework. The component
> driver exposes DAPM widgets, routes and kcontrols for the device. The DAI
> driver exposes DMIC interfaces, which can be used to connect different
> components in the ASoC layer. Makefile and Kconfig support is added to
> allow to build the driver. The DMIC devices can be enabled in the DT via
> "nvidia,tegra210-dmic" compatible string. This driver can be used for
> Tegra186 and Tegra194 chips as well.
>
> Signed-off-by: Sameer Pujar <spujar-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
Thanks!
Reviewed-by: Jon Hunter <jonathanh-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
Cheers
Jon
--
nvpublic
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.