* Re: [PATCH v2 2/2] kbuild: Disable CONFIG_LD_ORPHAN_WARN for ld.lld 10.0.1
From: Kees Cook @ 2020-12-02 18:56 UTC (permalink / raw)
To: Masahiro Yamada
Cc: Michal Marek, kernelci . org bot, Linux Kbuild mailing list,
Catalin Marinas, Mark Brown,
maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), Nick Desaulniers,
Russell King, LKML, linuxppc-dev, Arvind Sankar, Ingo Molnar,
Borislav Petkov, clang-built-linux, Nathan Chancellor,
Will Deacon, Thomas Gleixner, Linux ARM
In-Reply-To: <CAK7LNAQGqcCBBFbKwe_eTuBqtNwDn_q8c0nPBJVsEoHP6F+aKA@mail.gmail.com>
On Wed, Dec 02, 2020 at 11:37:38AM +0900, Masahiro Yamada wrote:
> On Wed, Dec 2, 2020 at 5:56 AM Kees Cook <keescook@chromium.org> wrote:
> >
> > On Tue, Dec 01, 2020 at 10:31:37PM +0900, Masahiro Yamada wrote:
> > > On Wed, Nov 25, 2020 at 7:22 AM Kees Cook <keescook@chromium.org> wrote:
> > > >
> > > > On Thu, Nov 19, 2020 at 01:13:27PM -0800, Nick Desaulniers wrote:
> > > > > On Thu, Nov 19, 2020 at 12:57 PM Nathan Chancellor
> > > > > <natechancellor@gmail.com> wrote:
> > > > > >
> > > > > > ld.lld 10.0.1 spews a bunch of various warnings about .rela sections,
> > > > > > along with a few others. Newer versions of ld.lld do not have these
> > > > > > warnings. As a result, do not add '--orphan-handling=warn' to
> > > > > > LDFLAGS_vmlinux if ld.lld's version is not new enough.
> > > > > >
> > > > > > Link: https://github.com/ClangBuiltLinux/linux/issues/1187
> > > > > > Link: https://github.com/ClangBuiltLinux/linux/issues/1193
> > > > > > Reported-by: Arvind Sankar <nivedita@alum.mit.edu>
> > > > > > Reported-by: kernelci.org bot <bot@kernelci.org>
> > > > > > Reported-by: Mark Brown <broonie@kernel.org>
> > > > > > Reviewed-by: Kees Cook <keescook@chromium.org>
> > > > > > Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
> > > > >
> > > > > Thanks for the additions in v2.
> > > > > Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
> > > >
> > > > I'm going to carry this for a few days in -next, and if no one screams,
> > > > ask Linus to pull it for v5.10-rc6.
> > > >
> > > > Thanks!
> > > >
> > > > --
> > > > Kees Cook
> > >
> > >
> > > Sorry for the delay.
> > > Applied to linux-kbuild.
> >
> > Great, thanks!
> >
> > > But, I already see this in linux-next.
> > > Please let me know if I should drop it from my tree.
> >
> > My intention was to get this to Linus this week. Do you want to do that
> > yourself, or Ack the patches in my tree and I'll send it?
>
> I will send a kbuild pull request myself this week.
Okay, thanks! I've removed it from my -next tree now.
--
Kees Cook
^ permalink raw reply
* Re: [PATCH v2 17/17] ibmvfc: provide modules parameters for MQ settings
From: Brian King @ 2020-12-02 18:40 UTC (permalink / raw)
To: Tyrel Datwyler, james.bottomley
Cc: brking, linuxppc-dev, linux-scsi, martin.petersen, linux-kernel
In-Reply-To: <20201202005329.4538-18-tyreld@linux.ibm.com>
On 12/1/20 6:53 PM, Tyrel Datwyler wrote:
> +module_param_named(mig_channels_only, mig_channels_only, uint, S_IRUGO | S_IWUSR);
> +MODULE_PARM_DESC(mig_channels_only, "Prevent migration to non-channelized system. "
> + "[Default=" __stringify(IBMVFC_MIG_NO_SUB_TO_CRQ) "]");
> +module_param_named(mig_no_less_channels, mig_no_less_channels, uint, S_IRUGO | S_IWUSR);
> +MODULE_PARM_DESC(mig_no_less_channels, "Prevent migration to system with less channels. "
> + "[Default=" __stringify(IBMVFC_MIG_NO_N_TO_M) "]");
Both of these are writeable, but it doesn't look like you do any re-negotiation
with the VIOS for these changed settings to take effect if someone changes
them at runtime.
> +
> module_param_named(init_timeout, init_timeout, uint, S_IRUGO | S_IWUSR);
> MODULE_PARM_DESC(init_timeout, "Initialization timeout in seconds. "
> "[Default=" __stringify(IBMVFC_INIT_TIMEOUT) "]");
> @@ -3228,6 +3250,36 @@ static ssize_t ibmvfc_store_log_level(struct device *dev,
> return strlen(buf);
> }
>
> +static ssize_t ibmvfc_show_scsi_channels(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct Scsi_Host *shost = class_to_shost(dev);
> + struct ibmvfc_host *vhost = shost_priv(shost);
> + unsigned long flags = 0;
> + int len;
> +
> + spin_lock_irqsave(shost->host_lock, flags);
> + len = snprintf(buf, PAGE_SIZE, "%d\n", vhost->client_scsi_channels);
> + spin_unlock_irqrestore(shost->host_lock, flags);
> + return len;
> +}
> +
> +static ssize_t ibmvfc_store_scsi_channels(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct Scsi_Host *shost = class_to_shost(dev);
> + struct ibmvfc_host *vhost = shost_priv(shost);
> + unsigned long flags = 0;
> + unsigned int channels;
> +
> + spin_lock_irqsave(shost->host_lock, flags);
> + channels = simple_strtoul(buf, NULL, 10);
> + vhost->client_scsi_channels = min(channels, nr_scsi_hw_queues);
Don't we need to do a LIP here for this new setting to go into effect?
> + spin_unlock_irqrestore(shost->host_lock, flags);
> + return strlen(buf);
> +}
> +
> static DEVICE_ATTR(partition_name, S_IRUGO, ibmvfc_show_host_partition_name, NULL);
> static DEVICE_ATTR(device_name, S_IRUGO, ibmvfc_show_host_device_name, NULL);
> static DEVICE_ATTR(port_loc_code, S_IRUGO, ibmvfc_show_host_loc_code, NULL);
--
Brian King
Power Linux I/O
IBM Linux Technology Center
^ permalink raw reply
* Re: [PATCH v2 16/17] ibmvfc: enable MQ and set reasonable defaults
From: Brian King @ 2020-12-02 18:31 UTC (permalink / raw)
To: Tyrel Datwyler, james.bottomley
Cc: brking, linuxppc-dev, linux-scsi, martin.petersen, linux-kernel
In-Reply-To: <20201202005329.4538-17-tyreld@linux.ibm.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
--
Brian King
Power Linux I/O
IBM Linux Technology Center
^ permalink raw reply
* Re: [PATCH v2 15/17] ibmvfc: send Cancel MAD down each hw scsi channel
From: Brian King @ 2020-12-02 18:27 UTC (permalink / raw)
To: Tyrel Datwyler, james.bottomley
Cc: brking, linuxppc-dev, linux-scsi, martin.petersen, linux-kernel
In-Reply-To: <20201202005329.4538-16-tyreld@linux.ibm.com>
On 12/1/20 6:53 PM, Tyrel Datwyler wrote:
> In general the client needs to send Cancel MADs and task management
> commands down the same channel as the command(s) intended to cancel or
> abort. The client assigns cancel keys per LUN and thus must send a
> Cancel down each channel commands were submitted for that LUN. Further,
> the client then must wait for those cancel completions prior to
> submitting a LUN RESET or ABORT TASK SET.
>
> Allocate event pointers for each possible scsi channel and assign an
> event for each channel that requires a cancel. Wait for completion each
> submitted cancel.
>
> Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
> ---
> drivers/scsi/ibmvscsi/ibmvfc.c | 106 +++++++++++++++++++++------------
> 1 file changed, 68 insertions(+), 38 deletions(-)
>
> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> index 0b6284020f06..97e8eed04b01 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> @@ -2339,32 +2339,52 @@ static int ibmvfc_cancel_all(struct scsi_device *sdev, int type)
> {
> struct ibmvfc_host *vhost = shost_priv(sdev->host);
> struct ibmvfc_event *evt, *found_evt;
> - union ibmvfc_iu rsp;
> - int rsp_rc = -EBUSY;
> + struct ibmvfc_event **evt_list;
> + union ibmvfc_iu *rsp;
> + int rsp_rc = 0;
> unsigned long flags;
> u16 status;
> + int num_hwq = 1;
> + int i;
> + int ret = 0;
>
> ENTER;
> spin_lock_irqsave(vhost->host->host_lock, flags);
> - found_evt = NULL;
> - list_for_each_entry(evt, &vhost->sent, queue) {
> - if (evt->cmnd && evt->cmnd->device == sdev) {
> - found_evt = evt;
> - break;
> + if (vhost->using_channels && vhost->scsi_scrqs.active_queues)
> + num_hwq = vhost->scsi_scrqs.active_queues;
> +
> + evt_list = kcalloc(num_hwq, sizeof(*evt_list), GFP_KERNE> + rsp = kcalloc(num_hwq, sizeof(*rsp), GFP_KERNEL);
Can't this just go on the stack? We don't want to be allocating memory
during error recovery. Or, alternatively, you could put this in the
vhost structure and protect it with a mutex. We only have enough events
to single thread these anyway.
> +
> + for (i = 0; i < num_hwq; i++) {
> + sdev_printk(KERN_INFO, sdev, "Cancelling outstanding commands on queue %d.\n", i);
Prior to this patch, if there was nothing outstanding to the device and cancel_all was called,
no messages would get printed. This is changing that behavior. Is that intentional? Additionally,
it looks like this will get a lot more vebose, logging a message for each hw queue, regardless
of whether there was anything outstanding. Perhaps you want to move this down to after the check
for !found_evt?
> +
> + found_evt = NULL;
> + list_for_each_entry(evt, &vhost->sent, queue) {
> + if (evt->cmnd && evt->cmnd->device == sdev && evt->hwq == i) {
> + found_evt = evt;
> + break;
> + }
> }
> - }
>
--
Brian King
Power Linux I/O
IBM Linux Technology Center
^ permalink raw reply
* Re: [PATCH kernel] powerpc/kuap: Restore AMR after replaying soft interrupts
From: kernel test robot @ 2020-12-02 18:08 UTC (permalink / raw)
To: Alexey Kardashevskiy, linuxppc-dev
Cc: Alexey Kardashevskiy, clang-built-linux, kbuild-all,
Nicholas Piggin
In-Reply-To: <20201202010952.7157-1-aik@ozlabs.ru>
[-- Attachment #1: Type: text/plain, Size: 12052 bytes --]
Hi Alexey,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on powerpc/next]
[also build test ERROR on linus/master v5.10-rc6 next-20201201]
[cannot apply to scottwood/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Alexey-Kardashevskiy/powerpc-kuap-Restore-AMR-after-replaying-soft-interrupts/20201202-094132
base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc64-randconfig-r024-20201202 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 2671fccf0381769276ca8246ec0499adcb9b0355)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install powerpc64 cross compiling tool for clang build
# apt-get install binutils-powerpc64-linux-gnu
# https://github.com/0day-ci/linux/commit/6b38a9b10a8384beeaa820e1c935cc4cabdb951e
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Alexey-Kardashevskiy/powerpc-kuap-Restore-AMR-after-replaying-soft-interrupts/20201202-094132
git checkout 6b38a9b10a8384beeaa820e1c935cc4cabdb951e
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc64
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
In file included from arch/powerpc/kernel/irq.c:31:
In file included from include/linux/kernel_stat.h:9:
In file included from include/linux/interrupt.h:11:
In file included from include/linux/hardirq.h:10:
In file included from arch/powerpc/include/asm/hardirq.h:6:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/powerpc/include/asm/io.h:604:
arch/powerpc/include/asm/io-defs.h:45:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
DEF_PCI_AC_NORET(insw, (unsigned long p, void *b, unsigned long c),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/powerpc/include/asm/io.h:601:3: note: expanded from macro 'DEF_PCI_AC_NORET'
__do_##name al; \
^~~~~~~~~~~~~~
<scratch space>:100:1: note: expanded from here
__do_insw
^
arch/powerpc/include/asm/io.h:542:56: note: expanded from macro '__do_insw'
#define __do_insw(p, b, n) readsw((PCI_IO_ADDR)_IO_BASE+(p), (b), (n))
~~~~~~~~~~~~~~~~~~~~~^
In file included from arch/powerpc/kernel/irq.c:31:
In file included from include/linux/kernel_stat.h:9:
In file included from include/linux/interrupt.h:11:
In file included from include/linux/hardirq.h:10:
In file included from arch/powerpc/include/asm/hardirq.h:6:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/powerpc/include/asm/io.h:604:
arch/powerpc/include/asm/io-defs.h:47:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
DEF_PCI_AC_NORET(insl, (unsigned long p, void *b, unsigned long c),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/powerpc/include/asm/io.h:601:3: note: expanded from macro 'DEF_PCI_AC_NORET'
__do_##name al; \
^~~~~~~~~~~~~~
<scratch space>:102:1: note: expanded from here
__do_insl
^
arch/powerpc/include/asm/io.h:543:56: note: expanded from macro '__do_insl'
#define __do_insl(p, b, n) readsl((PCI_IO_ADDR)_IO_BASE+(p), (b), (n))
~~~~~~~~~~~~~~~~~~~~~^
In file included from arch/powerpc/kernel/irq.c:31:
In file included from include/linux/kernel_stat.h:9:
In file included from include/linux/interrupt.h:11:
In file included from include/linux/hardirq.h:10:
In file included from arch/powerpc/include/asm/hardirq.h:6:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/powerpc/include/asm/io.h:604:
arch/powerpc/include/asm/io-defs.h:49:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
DEF_PCI_AC_NORET(outsb, (unsigned long p, const void *b, unsigned long c),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/powerpc/include/asm/io.h:601:3: note: expanded from macro 'DEF_PCI_AC_NORET'
__do_##name al; \
^~~~~~~~~~~~~~
<scratch space>:104:1: note: expanded from here
__do_outsb
^
arch/powerpc/include/asm/io.h:544:58: note: expanded from macro '__do_outsb'
#define __do_outsb(p, b, n) writesb((PCI_IO_ADDR)_IO_BASE+(p),(b),(n))
~~~~~~~~~~~~~~~~~~~~~^
In file included from arch/powerpc/kernel/irq.c:31:
In file included from include/linux/kernel_stat.h:9:
In file included from include/linux/interrupt.h:11:
In file included from include/linux/hardirq.h:10:
In file included from arch/powerpc/include/asm/hardirq.h:6:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/powerpc/include/asm/io.h:604:
arch/powerpc/include/asm/io-defs.h:51:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
DEF_PCI_AC_NORET(outsw, (unsigned long p, const void *b, unsigned long c),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/powerpc/include/asm/io.h:601:3: note: expanded from macro 'DEF_PCI_AC_NORET'
__do_##name al; \
^~~~~~~~~~~~~~
<scratch space>:106:1: note: expanded from here
__do_outsw
^
arch/powerpc/include/asm/io.h:545:58: note: expanded from macro '__do_outsw'
#define __do_outsw(p, b, n) writesw((PCI_IO_ADDR)_IO_BASE+(p),(b),(n))
~~~~~~~~~~~~~~~~~~~~~^
In file included from arch/powerpc/kernel/irq.c:31:
In file included from include/linux/kernel_stat.h:9:
In file included from include/linux/interrupt.h:11:
In file included from include/linux/hardirq.h:10:
In file included from arch/powerpc/include/asm/hardirq.h:6:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/powerpc/include/asm/io.h:604:
arch/powerpc/include/asm/io-defs.h:53:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
DEF_PCI_AC_NORET(outsl, (unsigned long p, const void *b, unsigned long c),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/powerpc/include/asm/io.h:601:3: note: expanded from macro 'DEF_PCI_AC_NORET'
__do_##name al; \
^~~~~~~~~~~~~~
<scratch space>:108:1: note: expanded from here
__do_outsl
^
arch/powerpc/include/asm/io.h:546:58: note: expanded from macro '__do_outsl'
#define __do_outsl(p, b, n) writesl((PCI_IO_ADDR)_IO_BASE+(p),(b),(n))
~~~~~~~~~~~~~~~~~~~~~^
>> arch/powerpc/kernel/irq.c:224:29: error: implicit declaration of function 'get_kuap' [-Werror,-Wimplicit-function-declaration]
unsigned long kuap_state = get_kuap();
^
>> arch/powerpc/kernel/irq.c:313:2: error: implicit declaration of function 'set_kuap' [-Werror,-Wimplicit-function-declaration]
set_kuap(kuap_state);
^
arch/powerpc/kernel/irq.c:313:2: note: did you mean 'get_kuap'?
arch/powerpc/kernel/irq.c:224:29: note: 'get_kuap' declared here
unsigned long kuap_state = get_kuap();
^
12 warnings and 2 errors generated.
vim +/get_kuap +224 arch/powerpc/kernel/irq.c
214
215 void replay_soft_interrupts(void)
216 {
217 /*
218 * We use local_paca rather than get_paca() to avoid all
219 * the debug_smp_processor_id() business in this low level
220 * function
221 */
222 unsigned char happened = local_paca->irq_happened;
223 struct pt_regs regs;
> 224 unsigned long kuap_state = get_kuap();
225
226 ppc_save_regs(®s);
227 regs.softe = IRQS_ENABLED;
228
229 again:
230 if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
231 WARN_ON_ONCE(mfmsr() & MSR_EE);
232
233 if (happened & PACA_IRQ_HARD_DIS) {
234 /*
235 * We may have missed a decrementer interrupt if hard disabled.
236 * Check the decrementer register in case we had a rollover
237 * while hard disabled.
238 */
239 if (!(happened & PACA_IRQ_DEC)) {
240 if (decrementer_check_overflow())
241 happened |= PACA_IRQ_DEC;
242 }
243 }
244
245 /*
246 * Force the delivery of pending soft-disabled interrupts on PS3.
247 * Any HV call will have this side effect.
248 */
249 if (firmware_has_feature(FW_FEATURE_PS3_LV1)) {
250 u64 tmp, tmp2;
251 lv1_get_version_info(&tmp, &tmp2);
252 }
253
254 /*
255 * Check if an hypervisor Maintenance interrupt happened.
256 * This is a higher priority interrupt than the others, so
257 * replay it first.
258 */
259 if (IS_ENABLED(CONFIG_PPC_BOOK3S) && (happened & PACA_IRQ_HMI)) {
260 local_paca->irq_happened &= ~PACA_IRQ_HMI;
261 regs.trap = 0xe60;
262 handle_hmi_exception(®s);
263 if (!(local_paca->irq_happened & PACA_IRQ_HARD_DIS))
264 hard_irq_disable();
265 }
266
267 if (happened & PACA_IRQ_DEC) {
268 local_paca->irq_happened &= ~PACA_IRQ_DEC;
269 regs.trap = 0x900;
270 timer_interrupt(®s);
271 if (!(local_paca->irq_happened & PACA_IRQ_HARD_DIS))
272 hard_irq_disable();
273 }
274
275 if (happened & PACA_IRQ_EE) {
276 local_paca->irq_happened &= ~PACA_IRQ_EE;
277 regs.trap = 0x500;
278 do_IRQ(®s);
279 if (!(local_paca->irq_happened & PACA_IRQ_HARD_DIS))
280 hard_irq_disable();
281 }
282
283 if (IS_ENABLED(CONFIG_PPC_DOORBELL) && (happened & PACA_IRQ_DBELL)) {
284 local_paca->irq_happened &= ~PACA_IRQ_DBELL;
285 if (IS_ENABLED(CONFIG_PPC_BOOK3E))
286 regs.trap = 0x280;
287 else
288 regs.trap = 0xa00;
289 doorbell_exception(®s);
290 if (!(local_paca->irq_happened & PACA_IRQ_HARD_DIS))
291 hard_irq_disable();
292 }
293
294 /* Book3E does not support soft-masking PMI interrupts */
295 if (IS_ENABLED(CONFIG_PPC_BOOK3S) && (happened & PACA_IRQ_PMI)) {
296 local_paca->irq_happened &= ~PACA_IRQ_PMI;
297 regs.trap = 0xf00;
298 performance_monitor_exception(®s);
299 if (!(local_paca->irq_happened & PACA_IRQ_HARD_DIS))
300 hard_irq_disable();
301 }
302
303 happened = local_paca->irq_happened;
304 if (happened & ~PACA_IRQ_HARD_DIS) {
305 /*
306 * We are responding to the next interrupt, so interrupt-off
307 * latencies should be reset here.
308 */
309 trace_hardirqs_on();
310 trace_hardirqs_off();
311 goto again;
312 }
> 313 set_kuap(kuap_state);
314 }
315
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 23722 bytes --]
^ permalink raw reply
* Re: [PATCH v2 01/17] ibmvfc: add vhost fields and defaults for MQ enablement
From: Tyrel Datwyler @ 2020-12-02 17:27 UTC (permalink / raw)
To: Brian King, james.bottomley
Cc: brking, linuxppc-dev, linux-scsi, martin.petersen, linux-kernel
In-Reply-To: <a11c0e6a-cfa6-0dc4-5d34-6fd35ae1f29b@linux.vnet.ibm.com>
On 12/2/20 7:14 AM, Brian King wrote:
> On 12/1/20 6:53 PM, Tyrel Datwyler wrote:
>> Introduce several new vhost fields for managing MQ state of the adapter
>> as well as initial defaults for MQ enablement.
>>
>> Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
>> ---
>> drivers/scsi/ibmvscsi/ibmvfc.c | 9 ++++++++-
>> drivers/scsi/ibmvscsi/ibmvfc.h | 13 +++++++++++--
>> 2 files changed, 19 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
>> index 42e4d35e0d35..f1d677a7423d 100644
>> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
>> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
>> @@ -5161,12 +5161,13 @@ static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id)
>> }
>>
>> shost->transportt = ibmvfc_transport_template;
>> - shost->can_queue = max_requests;
>> + shost->can_queue = (max_requests / IBMVFC_SCSI_HW_QUEUES);
>
> This doesn't look right. can_queue is the SCSI host queue depth, not the MQ queue depth.
Our max_requests is the total number commands allowed across all queues. From
what I understand is can_queue is the total number of commands in flight allowed
for each hw queue.
/*
* In scsi-mq mode, the number of hardware queues supported by the LLD.
*
* Note: it is assumed that each hardware queue has a queue depth of
* can_queue. In other words, the total queue depth per host
* is nr_hw_queues * can_queue. However, for when host_tagset is set,
* the total queue depth is can_queue.
*/
We currently don't use the host wide shared tagset.
-Tyrel
>
>> shost->max_lun = max_lun;
>> shost->max_id = max_targets;
>> shost->max_sectors = IBMVFC_MAX_SECTORS;
>> shost->max_cmd_len = IBMVFC_MAX_CDB_LEN;
>> shost->unique_id = shost->host_no;
>> + shost->nr_hw_queues = IBMVFC_SCSI_HW_QUEUES;
>>
>> vhost = shost_priv(shost);
>> INIT_LIST_HEAD(&vhost->sent);
>
>
>
^ permalink raw reply
* Re: [PATCH 00/13] ibmvfc: initial MQ development
From: Tyrel Datwyler @ 2020-12-02 17:19 UTC (permalink / raw)
To: Hannes Reinecke, james.bottomley
Cc: brking, linuxppc-dev, linux-scsi, martin.petersen, linux-kernel
In-Reply-To: <90e9a8ac-d2b9-bb64-7c7d-607adaea0f26@suse.de>
On 12/2/20 4:03 AM, Hannes Reinecke wrote:
> On 11/26/20 2:48 AM, Tyrel Datwyler wrote:
>> Recent updates in pHyp Firmware and VIOS releases provide new infrastructure
>> towards enabling Subordinate Command Response Queues (Sub-CRQs) such that each
>> Sub-CRQ is a channel backed by an actual hardware queue in the FC stack on the
>> partner VIOS. Sub-CRQs are registered with the firmware via hypercalls and then
>> negotiated with the VIOS via new Management Datagrams (MADs) for channel setup.
>>
>> This initial implementation adds the necessary Sub-CRQ framework and implements
>> the new MADs for negotiating and assigning a set of Sub-CRQs to associated VIOS
>> HW backed channels. The event pool and locking still leverages the legacy single
>> queue implementation, and as such lock contention is problematic when increasing
>> the number of queues. However, this initial work demonstrates a 1.2x factor
>> increase in IOPs when configured with two HW queues despite lock contention.
>>
> Why do you still hold the hold lock during submission?
Proof of concept.
> An initial check on the submission code path didn't reveal anything obvious, so
> it _should_ be possible to drop the host lock there.
Its used to protect the event pool and the event free/sent lists. This could
probably have its own lock instead of the host lock.
> Or at least move it into the submission function itself to avoid lock
> contention. Hmm?
I have a followup patch to do that, but I didn't see any change in performance.
I've got another patch I'm finishing that provides dedicated event pools for
each subqueue such that they will no longer have any dependency on the host lock.
-Tyrel
>
> Cheers,
>
> Hannes
^ permalink raw reply
* Re: [PATCH] drivers: char: tpm: remove unneeded MODULE_VERSION() usage
From: Jarkko Sakkinen @ 2020-12-02 17:06 UTC (permalink / raw)
To: Enrico Weigelt, metux IT consult
Cc: linux-kernel, jgg, paulus, linux-integrity, linuxppc-dev,
peterhuewe
In-Reply-To: <20201202121553.9383-1-info@metux.net>
On Wed, Dec 02, 2020 at 01:15:53PM +0100, Enrico Weigelt, metux IT consult wrote:
> Remove MODULE_VERSION(), as it isn't needed at all: the only version
> making sense is the kernel version.
Kernel version neither does make sense here. Why are mentioning it
in the commit message? Please just derive the commit message from
the one that Greg wrote.
> Link: https://lkml.org/lkml/2017/11/22/480
>
Remove the spurious empty line.
> Signed-off-by: Enrico Weigelt <info@metux.net>
> ---
> drivers/char/tpm/st33zp24/i2c.c | 1 -
> drivers/char/tpm/st33zp24/spi.c | 1 -
> drivers/char/tpm/st33zp24/st33zp24.c | 1 -
> drivers/char/tpm/tpm-interface.c | 1 -
> drivers/char/tpm/tpm_atmel.c | 1 -
> drivers/char/tpm/tpm_crb.c | 1 -
> drivers/char/tpm/tpm_i2c_infineon.c | 1 -
> drivers/char/tpm/tpm_ibmvtpm.c | 1 -
> drivers/char/tpm/tpm_infineon.c | 1 -
> drivers/char/tpm/tpm_nsc.c | 1 -
> drivers/char/tpm/tpm_tis.c | 1 -
> drivers/char/tpm/tpm_tis_core.c | 1 -
> drivers/char/tpm/tpm_vtpm_proxy.c | 1 -
> 13 files changed, 13 deletions(-)
>
> diff --git a/drivers/char/tpm/st33zp24/i2c.c b/drivers/char/tpm/st33zp24/i2c.c
> index 7c617edff4ca..7ed9829cacc4 100644
> --- a/drivers/char/tpm/st33zp24/i2c.c
> +++ b/drivers/char/tpm/st33zp24/i2c.c
> @@ -313,5 +313,4 @@ module_i2c_driver(st33zp24_i2c_driver);
>
> MODULE_AUTHOR("TPM support (TPMsupport@list.st.com)");
> MODULE_DESCRIPTION("STM TPM 1.2 I2C ST33 Driver");
> -MODULE_VERSION("1.3.0");
> MODULE_LICENSE("GPL");
> diff --git a/drivers/char/tpm/st33zp24/spi.c b/drivers/char/tpm/st33zp24/spi.c
> index a75dafd39445..147efea4eb05 100644
> --- a/drivers/char/tpm/st33zp24/spi.c
> +++ b/drivers/char/tpm/st33zp24/spi.c
> @@ -430,5 +430,4 @@ module_spi_driver(st33zp24_spi_driver);
>
> MODULE_AUTHOR("TPM support (TPMsupport@list.st.com)");
> MODULE_DESCRIPTION("STM TPM 1.2 SPI ST33 Driver");
> -MODULE_VERSION("1.3.0");
> MODULE_LICENSE("GPL");
> diff --git a/drivers/char/tpm/st33zp24/st33zp24.c b/drivers/char/tpm/st33zp24/st33zp24.c
> index 4ec10ab5e576..e0f1a5828993 100644
> --- a/drivers/char/tpm/st33zp24/st33zp24.c
> +++ b/drivers/char/tpm/st33zp24/st33zp24.c
> @@ -646,5 +646,4 @@ EXPORT_SYMBOL(st33zp24_pm_resume);
>
> MODULE_AUTHOR("TPM support (TPMsupport@list.st.com)");
> MODULE_DESCRIPTION("ST33ZP24 TPM 1.2 driver");
> -MODULE_VERSION("1.3.0");
> MODULE_LICENSE("GPL");
> diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
> index 1621ce818705..dfdc68b8bf88 100644
> --- a/drivers/char/tpm/tpm-interface.c
> +++ b/drivers/char/tpm/tpm-interface.c
> @@ -514,5 +514,4 @@ module_exit(tpm_exit);
>
> MODULE_AUTHOR("Leendert van Doorn (leendert@watson.ibm.com)");
> MODULE_DESCRIPTION("TPM Driver");
> -MODULE_VERSION("2.0");
> MODULE_LICENSE("GPL");
> diff --git a/drivers/char/tpm/tpm_atmel.c b/drivers/char/tpm/tpm_atmel.c
> index 54a6750a6757..35bf249cc95a 100644
> --- a/drivers/char/tpm/tpm_atmel.c
> +++ b/drivers/char/tpm/tpm_atmel.c
> @@ -231,5 +231,4 @@ module_exit(cleanup_atmel);
>
> MODULE_AUTHOR("Leendert van Doorn (leendert@watson.ibm.com)");
> MODULE_DESCRIPTION("TPM Driver");
> -MODULE_VERSION("2.0");
> MODULE_LICENSE("GPL");
> diff --git a/drivers/char/tpm/tpm_crb.c b/drivers/char/tpm/tpm_crb.c
> index a9dcf31eadd2..3e72b7b99cce 100644
> --- a/drivers/char/tpm/tpm_crb.c
> +++ b/drivers/char/tpm/tpm_crb.c
> @@ -748,5 +748,4 @@ static struct acpi_driver crb_acpi_driver = {
> module_acpi_driver(crb_acpi_driver);
> MODULE_AUTHOR("Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>");
> MODULE_DESCRIPTION("TPM2 Driver");
> -MODULE_VERSION("0.1");
> MODULE_LICENSE("GPL");
> diff --git a/drivers/char/tpm/tpm_i2c_infineon.c b/drivers/char/tpm/tpm_i2c_infineon.c
> index a19d32cb4e94..8920b7c19fcb 100644
> --- a/drivers/char/tpm/tpm_i2c_infineon.c
> +++ b/drivers/char/tpm/tpm_i2c_infineon.c
> @@ -731,5 +731,4 @@ static struct i2c_driver tpm_tis_i2c_driver = {
> module_i2c_driver(tpm_tis_i2c_driver);
> MODULE_AUTHOR("Peter Huewe <peter.huewe@infineon.com>");
> MODULE_DESCRIPTION("TPM TIS I2C Infineon Driver");
> -MODULE_VERSION("2.2.0");
> MODULE_LICENSE("GPL");
> diff --git a/drivers/char/tpm/tpm_ibmvtpm.c b/drivers/char/tpm/tpm_ibmvtpm.c
> index 994385bf37c0..5b04d113f634 100644
> --- a/drivers/char/tpm/tpm_ibmvtpm.c
> +++ b/drivers/char/tpm/tpm_ibmvtpm.c
> @@ -750,5 +750,4 @@ module_exit(ibmvtpm_module_exit);
>
> MODULE_AUTHOR("adlai@us.ibm.com");
> MODULE_DESCRIPTION("IBM vTPM Driver");
> -MODULE_VERSION("1.0");
> MODULE_LICENSE("GPL");
> diff --git a/drivers/char/tpm/tpm_infineon.c b/drivers/char/tpm/tpm_infineon.c
> index 9c924a1440a9..8a58966c5c9b 100644
> --- a/drivers/char/tpm/tpm_infineon.c
> +++ b/drivers/char/tpm/tpm_infineon.c
> @@ -621,5 +621,4 @@ module_pnp_driver(tpm_inf_pnp_driver);
>
> MODULE_AUTHOR("Marcel Selhorst <tpmdd@sirrix.com>");
> MODULE_DESCRIPTION("Driver for Infineon TPM SLD 9630 TT 1.1 / SLB 9635 TT 1.2");
> -MODULE_VERSION("1.9.2");
> MODULE_LICENSE("GPL");
> diff --git a/drivers/char/tpm/tpm_nsc.c b/drivers/char/tpm/tpm_nsc.c
> index 038701d48351..6ab2fe7e8782 100644
> --- a/drivers/char/tpm/tpm_nsc.c
> +++ b/drivers/char/tpm/tpm_nsc.c
> @@ -412,5 +412,4 @@ module_exit(cleanup_nsc);
>
> MODULE_AUTHOR("Leendert van Doorn (leendert@watson.ibm.com)");
> MODULE_DESCRIPTION("TPM Driver");
> -MODULE_VERSION("2.0");
> MODULE_LICENSE("GPL");
> diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c
> index 4ed6e660273a..3074235b405d 100644
> --- a/drivers/char/tpm/tpm_tis.c
> +++ b/drivers/char/tpm/tpm_tis.c
> @@ -429,5 +429,4 @@ module_init(init_tis);
> module_exit(cleanup_tis);
> MODULE_AUTHOR("Leendert van Doorn (leendert@watson.ibm.com)");
> MODULE_DESCRIPTION("TPM Driver");
> -MODULE_VERSION("2.0");
> MODULE_LICENSE("GPL");
> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> index 92c51c6cfd1b..20f4b2c7ea52 100644
> --- a/drivers/char/tpm/tpm_tis_core.c
> +++ b/drivers/char/tpm/tpm_tis_core.c
> @@ -1164,5 +1164,4 @@ EXPORT_SYMBOL_GPL(tpm_tis_resume);
>
> MODULE_AUTHOR("Leendert van Doorn (leendert@watson.ibm.com)");
> MODULE_DESCRIPTION("TPM Driver");
> -MODULE_VERSION("2.0");
> MODULE_LICENSE("GPL");
> diff --git a/drivers/char/tpm/tpm_vtpm_proxy.c b/drivers/char/tpm/tpm_vtpm_proxy.c
> index 91c772e38bb5..18f14162d1c1 100644
> --- a/drivers/char/tpm/tpm_vtpm_proxy.c
> +++ b/drivers/char/tpm/tpm_vtpm_proxy.c
> @@ -729,5 +729,4 @@ module_exit(vtpm_module_exit);
>
> MODULE_AUTHOR("Stefan Berger (stefanb@us.ibm.com)");
> MODULE_DESCRIPTION("vTPM Driver");
> -MODULE_VERSION("0.1");
> MODULE_LICENSE("GPL");
> --
> 2.11.0
>
>
Thanks.
/Jarkko
^ permalink raw reply
* Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option
From: Peter Zijlstra @ 2020-12-02 16:29 UTC (permalink / raw)
To: Andy Lutomirski
Cc: linux-arch, Arnd Bergmann, x86, linux-kernel, Nicholas Piggin,
linux-mm, Mathieu Desnoyers, linuxppc-dev
In-Reply-To: <BA2FB4C0-55EA-481A-824C-95B94EA29FAB@amacapital.net>
On Wed, Dec 02, 2020 at 06:38:12AM -0800, Andy Lutomirski wrote:
>
> > On Dec 2, 2020, at 6:20 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Sun, Nov 29, 2020 at 02:01:39AM +1000, Nicholas Piggin wrote:
> >> + * - A delayed freeing and RCU-like quiescing sequence based on
> >> + * mm switching to avoid IPIs completely.
> >
> > That one's interesting too. so basically you want to count switch_mm()
> > invocations on each CPU. Then, periodically snapshot the counter on each
> > CPU, and when they've all changed, increment a global counter.
> >
> > Then, you snapshot the global counter and wait for it to increment
> > (twice I think, the first increment might already be in progress).
> >
> > The only question here is what should drive this machinery.. the tick
> > probably.
> >
> > This shouldn't be too hard to do I think.
> >
> > Something a little like so perhaps?
>
> I don’t think this will work. A CPU can go idle with lazy mm and nohz
> forever. This could lead to unbounded memory use on a lightly loaded
> system.
Hurm.. quite so indeed. Fixing that seems to end up with requiring that
other proposal, such that we can tell which CPU has what active_mm
stuck.
Also, more complicated... :/
^ permalink raw reply
* Re: [PATCH v2 14/17] ibmvfc: add cancel mad initialization helper
From: Brian King @ 2020-12-02 16:00 UTC (permalink / raw)
To: Tyrel Datwyler, james.bottomley
Cc: brking, linuxppc-dev, linux-scsi, martin.petersen, linux-kernel
In-Reply-To: <20201202005329.4538-15-tyreld@linux.ibm.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
--
Brian King
Power Linux I/O
IBM Linux Technology Center
^ permalink raw reply
* Re: [PATCH v2 06/17] ibmvfc: add handlers to drain and complete Sub-CRQ responses
From: Brian King @ 2020-12-02 15:56 UTC (permalink / raw)
To: Tyrel Datwyler, james.bottomley
Cc: brking, linuxppc-dev, linux-scsi, martin.petersen, linux-kernel
In-Reply-To: <20201202005329.4538-7-tyreld@linux.ibm.com>
On 12/1/20 6:53 PM, Tyrel Datwyler wrote:
> +static void ibmvfc_handle_scrq(struct ibmvfc_crq *crq, struct ibmvfc_host *vhost)
> +{
> + struct ibmvfc_event *evt = (struct ibmvfc_event *)be64_to_cpu(crq->ioba);
> + unsigned long flags;
> +
> + switch (crq->valid) {
> + case IBMVFC_CRQ_CMD_RSP:
> + break;
> + case IBMVFC_CRQ_XPORT_EVENT:
> + return;
> + default:
> + dev_err(vhost->dev, "Got and invalid message type 0x%02x\n", crq->valid);
> + return;
> + }
> +
> + /* The only kind of payload CRQs we should get are responses to
> + * things we send. Make sure this response is to something we
> + * actually sent
> + */
> + if (unlikely(!ibmvfc_valid_event(&vhost->pool, evt))) {
> + dev_err(vhost->dev, "Returned correlation_token 0x%08llx is invalid!\n",
> + crq->ioba);
> + return;
> + }
> +
> + if (unlikely(atomic_read(&evt->free))) {
> + dev_err(vhost->dev, "Received duplicate correlation_token 0x%08llx!\n",
> + crq->ioba);
> + return;
> + }
> +
> + spin_lock_irqsave(vhost->host->host_lock, flags);
> + del_timer(&evt->timer);
> + list_del(&evt->queue);
> + ibmvfc_trc_end(evt);
Another thought here... If you are going through ibmvfc_purge_requests at the same time
as this code, you could check the free bit above, then have ibmvfc_purge_requests
put the event on the free queue and call scsi_done, then you come down and get the host
lock here, remove the command from the free list, and call the done function again,
which could result in a double completion to the scsi layer.
I think you need to grab the host lock before you check the free bit to avoid this race.
> + spin_unlock_irqrestore(vhost->host->host_lock, flags);
> + evt->done(evt);
> +}
> +
--
Brian King
Power Linux I/O
IBM Linux Technology Center
^ permalink raw reply
* Re: [PATCH v2 09/17] ibmvfc: implement channel enquiry and setup commands
From: Brian King @ 2020-12-02 15:51 UTC (permalink / raw)
To: Tyrel Datwyler, james.bottomley
Cc: brking, linuxppc-dev, linux-scsi, martin.petersen, linux-kernel
In-Reply-To: <20201202005329.4538-10-tyreld@linux.ibm.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
--
Brian King
Power Linux I/O
IBM Linux Technology Center
^ permalink raw reply
* Re: [PATCH v2 06/17] ibmvfc: add handlers to drain and complete Sub-CRQ responses
From: Brian King @ 2020-12-02 15:46 UTC (permalink / raw)
To: Tyrel Datwyler, james.bottomley
Cc: brking, linuxppc-dev, linux-scsi, martin.petersen, linux-kernel
In-Reply-To: <20201202005329.4538-7-tyreld@linux.ibm.com>
On 12/1/20 6:53 PM, Tyrel Datwyler wrote:
> The logic for iterating over the Sub-CRQ responses is similiar to that
> of the primary CRQ. Add the necessary handlers for processing those
> responses.
>
> Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
> ---
> drivers/scsi/ibmvscsi/ibmvfc.c | 77 ++++++++++++++++++++++++++++++++++
> 1 file changed, 77 insertions(+)
>
> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> index 97f00fefa809..e9da3f60c793 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> @@ -3381,6 +3381,83 @@ static int ibmvfc_toggle_scrq_irq(struct ibmvfc_sub_queue *scrq, int enable)
> return rc;
> }
>
> +static void ibmvfc_handle_scrq(struct ibmvfc_crq *crq, struct ibmvfc_host *vhost)
> +{
> + struct ibmvfc_event *evt = (struct ibmvfc_event *)be64_to_cpu(crq->ioba);
> + unsigned long flags;
> +
> + switch (crq->valid) {
> + case IBMVFC_CRQ_CMD_RSP:
> + break;
> + case IBMVFC_CRQ_XPORT_EVENT:
> + return;
> + default:
> + dev_err(vhost->dev, "Got and invalid message type 0x%02x\n", crq->valid);
> + return;
> + }
> +
> + /* The only kind of payload CRQs we should get are responses to
> + * things we send. Make sure this response is to something we
> + * actually sent
> + */
> + if (unlikely(!ibmvfc_valid_event(&vhost->pool, evt))) {
> + dev_err(vhost->dev, "Returned correlation_token 0x%08llx is invalid!\n",
> + crq->ioba);
> + return;
> + }
> +
> + if (unlikely(atomic_read(&evt->free))) {
> + dev_err(vhost->dev, "Received duplicate correlation_token 0x%08llx!\n",
> + crq->ioba);
> + return;
> + }
> +
> + spin_lock_irqsave(vhost->host->host_lock, flags);
> + del_timer(&evt->timer);
> + list_del(&evt->queue);
> + ibmvfc_trc_end(evt);
> + spin_unlock_irqrestore(vhost->host->host_lock, flags);
> + evt->done(evt);
> +}
> +
> +static struct ibmvfc_crq *ibmvfc_next_scrq(struct ibmvfc_sub_queue *scrq)
> +{
> + struct ibmvfc_crq *crq;
> +
> + crq = &scrq->msgs[scrq->cur].crq;
> + if (crq->valid & 0x80) {
> + if (++scrq->cur == scrq->size)
You are incrementing the cur pointer without any locks held. Although
unlikely, could you also be in ibmvfc_reset_crq in another thread?
If so, you'd have a subtle race condition here where the cur pointer could
be read, then ibmvfc_reset_crq writes it to zero, then this thread
writes it to a non zero value, which would then cause you to be out of
sync with the VIOS as to where the cur pointer is.
> + scrq->cur = 0;
> + rmb();
> + } else
> + crq = NULL;
> +
> + return crq;
> +}
> +
--
Brian King
Power Linux I/O
IBM Linux Technology Center
^ permalink raw reply
* Re: [PATCH v2 04/17] ibmvfc: add alloc/dealloc routines for SCSI Sub-CRQ Channels
From: Brian King @ 2020-12-02 15:25 UTC (permalink / raw)
To: Tyrel Datwyler, james.bottomley
Cc: brking, linuxppc-dev, linux-scsi, martin.petersen, linux-kernel
In-Reply-To: <20201202005329.4538-5-tyreld@linux.ibm.com>
On 12/1/20 6:53 PM, Tyrel Datwyler wrote:
> +static int ibmvfc_register_scsi_channel(struct ibmvfc_host *vhost,
> + int index)
> +{
> + struct device *dev = vhost->dev;
> + struct vio_dev *vdev = to_vio_dev(dev);
> + struct ibmvfc_sub_queue *scrq = &vhost->scsi_scrqs.scrqs[index];
> + int rc = -ENOMEM;
> +
> + ENTER;
> +
> + scrq->msgs = (struct ibmvfc_sub_crq *)get_zeroed_page(GFP_KERNEL);
> + if (!scrq->msgs)
> + return rc;
> +
> + scrq->size = PAGE_SIZE / sizeof(*scrq->msgs);
> + scrq->msg_token = dma_map_single(dev, scrq->msgs, PAGE_SIZE,
> + DMA_BIDIRECTIONAL);
> +
> + if (dma_mapping_error(dev, scrq->msg_token))
> + goto dma_map_failed;
> +
> + rc = h_reg_sub_crq(vdev->unit_address, scrq->msg_token, PAGE_SIZE,
> + &scrq->cookie, &scrq->hw_irq);
> +
> + if (rc) {
> + dev_warn(dev, "Error registering sub-crq: %d\n", rc);
> + dev_warn(dev, "Firmware may not support MQ\n");
Will this now get logged everywhere this new driver runs if the firmware
does not support sub CRQs? Is there something better that could be done
here to only log this for a true error and not just because a new driver
is running with an older firmware release?
> + goto reg_failed;
> + }
> +
> + scrq->hwq_id = index;
> + scrq->vhost = vhost;
> +
> + LEAVE;
> + return 0;
> +
> +reg_failed:
> + dma_unmap_single(dev, scrq->msg_token, PAGE_SIZE, DMA_BIDIRECTIONAL);
> +dma_map_failed:
> + free_page((unsigned long)scrq->msgs);
> + LEAVE;
> + return rc;
> +}
> +
--
Brian King
Power Linux I/O
IBM Linux Technology Center
^ permalink raw reply
* Re: [PATCH v2 01/17] ibmvfc: add vhost fields and defaults for MQ enablement
From: Brian King @ 2020-12-02 15:14 UTC (permalink / raw)
To: Tyrel Datwyler, james.bottomley
Cc: brking, linuxppc-dev, linux-scsi, martin.petersen, linux-kernel
In-Reply-To: <20201202005329.4538-2-tyreld@linux.ibm.com>
On 12/1/20 6:53 PM, Tyrel Datwyler wrote:
> Introduce several new vhost fields for managing MQ state of the adapter
> as well as initial defaults for MQ enablement.
>
> Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
> ---
> drivers/scsi/ibmvscsi/ibmvfc.c | 9 ++++++++-
> drivers/scsi/ibmvscsi/ibmvfc.h | 13 +++++++++++--
> 2 files changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> index 42e4d35e0d35..f1d677a7423d 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> @@ -5161,12 +5161,13 @@ static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id)
> }
>
> shost->transportt = ibmvfc_transport_template;
> - shost->can_queue = max_requests;
> + shost->can_queue = (max_requests / IBMVFC_SCSI_HW_QUEUES);
This doesn't look right. can_queue is the SCSI host queue depth, not the MQ queue depth.
> shost->max_lun = max_lun;
> shost->max_id = max_targets;
> shost->max_sectors = IBMVFC_MAX_SECTORS;
> shost->max_cmd_len = IBMVFC_MAX_CDB_LEN;
> shost->unique_id = shost->host_no;
> + shost->nr_hw_queues = IBMVFC_SCSI_HW_QUEUES;
>
> vhost = shost_priv(shost);
> INIT_LIST_HEAD(&vhost->sent);
--
Brian King
Power Linux I/O
IBM Linux Technology Center
^ permalink raw reply
* Re: [PATCH v9 0/6] KASAN for powerpc64 radix
From: Andrey Konovalov @ 2020-12-02 15:11 UTC (permalink / raw)
To: Daniel Axtens
Cc: Christophe Leroy, Aneesh Kumar K.V, LKML, kasan-dev,
Linux Memory Management List, PowerPC
In-Reply-To: <20201201161632.1234753-1-dja@axtens.net>
On Tue, Dec 1, 2020 at 5:16 PM Daniel Axtens <dja@axtens.net> wrote:
>
> Building on the work of Christophe, Aneesh and Balbir, I've ported
> KASAN to 64-bit Book3S kernels running on the Radix MMU.
>
> This is a significant reworking of the previous versions. Instead of
> the previous approach which supported inline instrumentation, this
> series provides only outline instrumentation.
>
> To get around the problem of accessing the shadow region inside code we run
> with translations off (in 'real mode'), we we restrict checking to when
> translations are enabled. This is done via a new hook in the kasan core and
> by excluding larger quantites of arch code from instrumentation. The upside
> is that we no longer require that you be able to specify the amount of
> physically contiguous memory on the system at compile time. Hopefully this
> is a better trade-off. More details in patch 6.
>
> kexec works. Both 64k and 4k pages work. Running as a KVM host works, but
> nothing in arch/powerpc/kvm is instrumented. It's also potentially a bit
> fragile - if any real mode code paths call out to instrumented code, things
> will go boom.
>
> There are 4 failing KUnit tests:
>
> kasan_stack_oob, kasan_alloca_oob_left & kasan_alloca_oob_right - these are
> due to not supporting inline instrumentation.
>
> kasan_global_oob - gcc puts the ASAN init code in a section called
> '.init_array'. Powerpc64 module loading code goes through and _renames_ any
> section beginning with '.init' to begin with '_init' in order to avoid some
> complexities around our 24-bit indirect jumps. This means it renames
> '.init_array' to '_init_array', and the generic module loading code then
> fails to recognise the section as a constructor and thus doesn't run
> it. This hack dates back to 2003 and so I'm not going to try to unpick it
> in this series. (I suspect this may have previously worked if the code
> ended up in .ctors rather than .init_array but I don't keep my old binaries
> around so I have no real way of checking.)
Hi Daniel,
Just FYI: there's a number of KASAN-related patches in the mm tree
right now, so this series will need to be rebased. Onto mm or onto
5.11-rc1 one it's been released.
Thanks!
^ permalink raw reply
* Re: [PATCH kernel v3] powerpc/pci: Remove LSI mappings on device teardown
From: Cédric Le Goater @ 2020-12-02 14:47 UTC (permalink / raw)
To: linuxppc-dev
In-Reply-To: <20201202005222.5477-1-aik@ozlabs.ru>
On 12/2/20 1:52 AM, Alexey Kardashevskiy wrote:
> From: Oliver O'Halloran <oohall@gmail.com>
>
> When a passthrough IO adapter is removed from a pseries machine using hash
> MMU and the XIVE interrupt mode, the POWER hypervisor expects the guest OS
> to clear all page table entries related to the adapter. If some are still
> present, the RTAS call which isolates the PCI slot returns error 9001
> "valid outstanding translations" and the removal of the IO adapter fails.
> This is because when the PHBs are scanned, Linux maps automatically the
> INTx interrupts in the Linux interrupt number space but these are never
> removed.
>
> This problem can be fixed by adding the corresponding unmap operation when
> the device is removed. There's no pcibios_* hook for the remove case, but
> the same effect can be achieved using a bus notifier.
>
> Because INTx are shared among PHBs (and potentially across the system),
> this adds tracking of virq to unmap them only when the last user is gone.
>
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
> [aik: added refcounter]
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
I did some PHB hotplug tests on a KVM guest and a LPAR using only LSIs.
Tested-by: Cédric Le Goater <clg@kaod.org>
Thanks Alexey,
C.
> ---
> Changes:
> v3:
> * free @vi on error path
>
> v2:
> * added refcounter
> ---
> arch/powerpc/kernel/pci-common.c | 82 ++++++++++++++++++++++++++++++--
> 1 file changed, 78 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
> index be108616a721..2b555997b295 100644
> --- a/arch/powerpc/kernel/pci-common.c
> +++ b/arch/powerpc/kernel/pci-common.c
> @@ -353,6 +353,55 @@ struct pci_controller *pci_find_controller_for_domain(int domain_nr)
> return NULL;
> }
>
> +struct pci_intx_virq {
> + int virq;
> + struct kref kref;
> + struct list_head list_node;
> +};
> +
> +static LIST_HEAD(intx_list);
> +static DEFINE_MUTEX(intx_mutex);
> +
> +static void ppc_pci_intx_release(struct kref *kref)
> +{
> + struct pci_intx_virq *vi = container_of(kref, struct pci_intx_virq, kref);
> +
> + list_del(&vi->list_node);
> + irq_dispose_mapping(vi->virq);
> + kfree(vi);
> +}
> +
> +static int ppc_pci_unmap_irq_line(struct notifier_block *nb,
> + unsigned long action, void *data)
> +{
> + struct pci_dev *pdev = to_pci_dev(data);
> +
> + if (action == BUS_NOTIFY_DEL_DEVICE) {
> + struct pci_intx_virq *vi;
> +
> + mutex_lock(&intx_mutex);
> + list_for_each_entry(vi, &intx_list, list_node) {
> + if (vi->virq == pdev->irq) {
> + kref_put(&vi->kref, ppc_pci_intx_release);
> + break;
> + }
> + }
> + mutex_unlock(&intx_mutex);
> + }
> +
> + return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block ppc_pci_unmap_irq_notifier = {
> + .notifier_call = ppc_pci_unmap_irq_line,
> +};
> +
> +static int ppc_pci_register_irq_notifier(void)
> +{
> + return bus_register_notifier(&pci_bus_type, &ppc_pci_unmap_irq_notifier);
> +}
> +arch_initcall(ppc_pci_register_irq_notifier);
> +
> /*
> * Reads the interrupt pin to determine if interrupt is use by card.
> * If the interrupt is used, then gets the interrupt line from the
> @@ -361,6 +410,12 @@ struct pci_controller *pci_find_controller_for_domain(int domain_nr)
> static int pci_read_irq_line(struct pci_dev *pci_dev)
> {
> int virq;
> + struct pci_intx_virq *vi, *vitmp;
> +
> + /* Preallocate vi as rewind is complex if this fails after mapping */
> + vi = kzalloc(sizeof(struct pci_intx_virq), GFP_KERNEL);
> + if (!vi)
> + return -1;
>
> pr_debug("PCI: Try to map irq for %s...\n", pci_name(pci_dev));
>
> @@ -377,12 +432,12 @@ static int pci_read_irq_line(struct pci_dev *pci_dev)
> * function.
> */
> if (pci_read_config_byte(pci_dev, PCI_INTERRUPT_PIN, &pin))
> - return -1;
> + goto error_exit;
> if (pin == 0)
> - return -1;
> + goto error_exit;
> if (pci_read_config_byte(pci_dev, PCI_INTERRUPT_LINE, &line) ||
> line == 0xff || line == 0) {
> - return -1;
> + goto error_exit;
> }
> pr_debug(" No map ! Using line %d (pin %d) from PCI config\n",
> line, pin);
> @@ -394,14 +449,33 @@ static int pci_read_irq_line(struct pci_dev *pci_dev)
>
> if (!virq) {
> pr_debug(" Failed to map !\n");
> - return -1;
> + goto error_exit;
> }
>
> pr_debug(" Mapped to linux irq %d\n", virq);
>
> pci_dev->irq = virq;
>
> + mutex_lock(&intx_mutex);
> + list_for_each_entry(vitmp, &intx_list, list_node) {
> + if (vitmp->virq == virq) {
> + kref_get(&vitmp->kref);
> + kfree(vi);
> + vi = NULL;
> + break;
> + }
> + }
> + if (vi) {
> + vi->virq = virq;
> + kref_init(&vi->kref);
> + list_add_tail(&vi->list_node, &intx_list);
> + }
> + mutex_unlock(&intx_mutex);
> +
> return 0;
> +error_exit:
> + kfree(vi);
> + return -1;
> }
>
> /*
>
^ permalink raw reply
* Re: powerpc 5.10-rcN boot failures with RCU_SCALE_TEST=m
From: Uladzislau Rezki @ 2020-12-02 14:39 UTC (permalink / raw)
To: Michael Ellerman, Paul E . McKenney
Cc: rcu, linuxppc-dev, Paul E . McKenney, Daniel Axtens
In-Reply-To: <87v9dkuwy3.fsf@mpe.ellerman.id.au>
On Thu, Dec 03, 2020 at 01:03:32AM +1100, Michael Ellerman wrote:
> Daniel Axtens <dja@axtens.net> writes:
> > Hi all,
> >
> > I'm having some difficulty tracking down a bug.
> >
> > Some configurations of the powerpc kernel since somewhere in the 5.10
> > merge window fail to boot on some ppc64 systems. They hang while trying
> > to bring up SMP. It seems to depend on the RCU_SCALE/PERF_TEST option.
> > (It was renamed in the 5.10 merge window.)
> >
> > I can reproduce it as follows with qemu tcg:
> >
> > make -j64 pseries_le_defconfig
> > scripts/config -m RCU_SCALE_TEST
> > scripts/config -m RCU_PERF_TEST
> > make -j 64 vmlinux CC="ccache gcc"
> >
> > qemu-system-ppc64 -cpu power9 -M pseries -m 1G -nographic -vga none -smp 4 -kernel vmlinux
> >
> > ...
> > [ 0.036284][ T0] Mount-cache hash table entries: 8192 (order: 0, 65536 bytes, linear)
> > [ 0.036481][ T0] Mountpoint-cache hash table entries: 8192 (order: 0, 65536 bytes, linear)
> > [ 0.148168][ T1] POWER9 performance monitor hardware support registered
> > [ 0.151118][ T1] rcu: Hierarchical SRCU implementation.
> > [ 0.186660][ T1] smp: Bringing up secondary CPUs ...
> > <hangs>
>
> One does not simply hang :)
>
> > I have no idea why RCU_SCALE/PERF_TEST would be causing this, but that
> > seems to be what does it: if I don't set that, the kernel boots fine.
>
> It seems to be TASKS_RCU that is the key.
>
> I don't need RCU_SCALE_TEST enabled, I can trigger it just with the
> following applied:
>
> diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
> index 0ebe15a84985..f3500c95d6a1 100644
> --- a/kernel/rcu/Kconfig
> +++ b/kernel/rcu/Kconfig
> @@ -78,7 +78,7 @@ config TASKS_RCU_GENERIC
> task-based RCU implementations. Not for manual selection.
>
> config TASKS_RCU
> - def_bool PREEMPTION
> + def_bool y
> help
> This option enables a task-based RCU implementation that uses
> only voluntary context switch (not preemption!), idle, and
>
>
> And bisect points to:
> 36dadef23fcc ("kprobes: Init kprobes in early_initcall")
>
> Which moved init_kprobes() prior to SMP bringup.
>
>
> For some reason when it gets stuck sysrq doesn't work, but I was able to
> get it into gdb and manually call handle_sysrq('t') to get the output
> below.
>
> The SMP bringup stalls because _cpu_up() is blocked trying to take
> cpu_hotplug_lock for writing:
>
> [ 401.403132][ T0] task:swapper/0 state:D stack:12512 pid: 1 ppid: 0 flags:0x00000800
> [ 401.403502][ T0] Call Trace:
> [ 401.403907][ T0] [c0000000062c37d0] [c0000000062c3830] 0xc0000000062c3830 (unreliable)
> [ 401.404068][ T0] [c0000000062c39b0] [c000000000019d70] __switch_to+0x2e0/0x4a0
> [ 401.404189][ T0] [c0000000062c3a10] [c000000000b87228] __schedule+0x288/0x9b0
> [ 401.404257][ T0] [c0000000062c3ad0] [c000000000b879b8] schedule+0x68/0x120
> [ 401.404324][ T0] [c0000000062c3b00] [c000000000184ad4] percpu_down_write+0x164/0x170
> [ 401.404390][ T0] [c0000000062c3b50] [c000000000116b68] _cpu_up+0x68/0x280
> [ 401.404475][ T0] [c0000000062c3bb0] [c000000000116e70] cpu_up+0xf0/0x140
> [ 401.404546][ T0] [c0000000062c3c30] [c00000000011776c] bringup_nonboot_cpus+0xac/0xf0
> [ 401.404643][ T0] [c0000000062c3c80] [c000000000eea1b8] smp_init+0x40/0xcc
> [ 401.404727][ T0] [c0000000062c3ce0] [c000000000ec43dc] kernel_init_freeable+0x1e0/0x3a0
> [ 401.404799][ T0] [c0000000062c3db0] [c000000000011ec4] kernel_init+0x24/0x150
> [ 401.404958][ T0] [c0000000062c3e20] [c00000000000daf0] ret_from_kernel_thread+0x5c/0x6c
>
> It can't get it because kprobe_optimizer() has taken it for read and is now
> blocked waiting for synchronize_rcu_tasks():
>
> [ 401.418808][ T0] task:kworker/0:1 state:D stack:13392 pid: 12 ppid: 2 flags:0x00000800
> [ 401.418951][ T0] Workqueue: events kprobe_optimizer
> [ 401.419078][ T0] Call Trace:
> [ 401.419121][ T0] [c0000000062ef650] [c0000000062ef710] 0xc0000000062ef710 (unreliable)
> [ 401.419213][ T0] [c0000000062ef830] [c000000000019d70] __switch_to+0x2e0/0x4a0
> [ 401.419281][ T0] [c0000000062ef890] [c000000000b87228] __schedule+0x288/0x9b0
> [ 401.419347][ T0] [c0000000062ef950] [c000000000b879b8] schedule+0x68/0x120
> [ 401.419415][ T0] [c0000000062ef980] [c000000000b8e664] schedule_timeout+0x2a4/0x340
> [ 401.419484][ T0] [c0000000062efa80] [c000000000b894ec] wait_for_completion+0x9c/0x170
> [ 401.419552][ T0] [c0000000062efae0] [c0000000001ac85c] __wait_rcu_gp+0x19c/0x210
> [ 401.419619][ T0] [c0000000062efb40] [c0000000001ac90c] synchronize_rcu_tasks_generic+0x3c/0x70
> [ 401.419690][ T0] [c0000000062efbe0] [c00000000022a3dc] kprobe_optimizer+0x1dc/0x470
> [ 401.419757][ T0] [c0000000062efc60] [c000000000136684] process_one_work+0x2f4/0x530
> [ 401.419823][ T0] [c0000000062efd20] [c000000000138d28] worker_thread+0x78/0x570
> [ 401.419891][ T0] [c0000000062efdb0] [c000000000142424] kthread+0x194/0x1a0
> [ 401.419976][ T0] [c0000000062efe20] [c00000000000daf0] ret_from_kernel_thread+0x5c/0x6c
>
> But why is the synchronize_rcu_tasks() not completing?
>
I think that it is because RCU is not fully initialized by that time.
The 36dadef23fcc ("kprobes: Init kprobes in early_initcall") patch
switches to early_initcall() that has a higher priority sequence than
core_initcall() that is used to complete an RCU setup in the rcu_set_runtime_mode().
--
Vlad Rezki
^ permalink raw reply
* Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option
From: Andy Lutomirski @ 2020-12-02 14:38 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-arch, Arnd Bergmann, x86, linux-kernel, Nicholas Piggin,
linux-mm, Mathieu Desnoyers, linuxppc-dev
In-Reply-To: <20201202141957.GJ3021@hirez.programming.kicks-ass.net>
> On Dec 2, 2020, at 6:20 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Sun, Nov 29, 2020 at 02:01:39AM +1000, Nicholas Piggin wrote:
>> + * - A delayed freeing and RCU-like quiescing sequence based on
>> + * mm switching to avoid IPIs completely.
>
> That one's interesting too. so basically you want to count switch_mm()
> invocations on each CPU. Then, periodically snapshot the counter on each
> CPU, and when they've all changed, increment a global counter.
>
> Then, you snapshot the global counter and wait for it to increment
> (twice I think, the first increment might already be in progress).
>
> The only question here is what should drive this machinery.. the tick
> probably.
>
> This shouldn't be too hard to do I think.
>
> Something a little like so perhaps?
I don’t think this will work. A CPU can go idle with lazy mm and nohz forever. This could lead to unbounded memory use on a lightly loaded system.
^ permalink raw reply
* Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option
From: Peter Zijlstra @ 2020-12-02 14:19 UTC (permalink / raw)
To: Nicholas Piggin
Cc: linux-arch, Arnd Bergmann, x86, linux-kernel, linux-mm,
Mathieu Desnoyers, linuxppc-dev
In-Reply-To: <20201128160141.1003903-7-npiggin@gmail.com>
On Sun, Nov 29, 2020 at 02:01:39AM +1000, Nicholas Piggin wrote:
> + * - A delayed freeing and RCU-like quiescing sequence based on
> + * mm switching to avoid IPIs completely.
That one's interesting too. so basically you want to count switch_mm()
invocations on each CPU. Then, periodically snapshot the counter on each
CPU, and when they've all changed, increment a global counter.
Then, you snapshot the global counter and wait for it to increment
(twice I think, the first increment might already be in progress).
The only question here is what should drive this machinery.. the tick
probably.
This shouldn't be too hard to do I think.
Something a little like so perhaps?
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 41404afb7f4c..27b64a60a468 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4525,6 +4525,7 @@ context_switch(struct rq *rq, struct task_struct *prev,
* finish_task_switch()'s mmdrop().
*/
switch_mm_irqs_off(prev->active_mm, next->mm, next);
+ rq->nr_mm_switches++;
if (!prev->mm) { // from kernel
/* will mmdrop() in finish_task_switch(). */
@@ -4739,6 +4740,80 @@ unsigned long long task_sched_runtime(struct task_struct *p)
return ns;
}
+static DEFINE_PER_CPU(unsigned long[2], mm_switches);
+
+static struct {
+ unsigned long __percpu *switches[2];
+ unsigned long generation;
+ atomic_t complete;
+ struct wait_queue_dead wait;
+} mm_foo = {
+ .switches = &mm_switches,
+ .generation = 0,
+ .complete = -1, // XXX bootstrap, hotplug
+ .wait = __WAIT_QUEUE_HEAD_INITIALIZER(mm_foo.wait),
+};
+
+static void mm_gen_tick(int cpu, struct rq *rq)
+{
+ unsigned long prev, curr, switches = rq->nr_mm_switches;
+ int idx = READ_ONCE(mm_foo.generation) & 1;
+
+ /* DATA-DEP on mm_foo.generation */
+
+ prev = __this_cpu_read(mm_foo.switches[idx^1]);
+ curr = __this_cpu_read(mm_foo.switches[idx]);
+
+ /* we haven't switched since the last generation */
+ if (prev == switches)
+ return false;
+
+ __this_cpu_write(mm_foo.switches[idx], switches);
+
+ /*
+ * If @curr is less than @prev, this is the first update of
+ * this generation, per the above, switches has also increased since,
+ * so mark out CPU complete.
+ */
+ if ((long)(curr - prev) < 0 && atomic_dec_and_test(&mm_foo.complete)) {
+ /*
+ * All CPUs are complete, IOW they all switched at least once
+ * since the last generation. Reset the completion counter and
+ * increment the generation.
+ */
+ atomic_set(&mm_foo.complete, nr_online_cpus());
+ /*
+ * Matches the address dependency above:
+ *
+ * idx = gen & 1 complete = nr_cpus
+ * <DATA-DEP> <WMB>
+ * curr = sw[idx] generation++;
+ * prev = sw[idx^1]
+ * if (curr < prev)
+ * complete--
+ *
+ * If we don't observe the new generation; we'll not decrement. If we
+ * do see the new generation, we must also see the new completion count.
+ */
+ smp_wmb();
+ mm_foo.generation++;
+ return true;
+ }
+
+ return false;
+}
+
+static void mm_gen_wake(void)
+{
+ wake_up_all(&mm_foo.wait);
+}
+
+static void mm_gen_wait(void)
+{
+ unsigned int gen = READ_ONCE(mm_foo.generation);
+ wait_event(&mm_foo.wait, READ_ONCE(mm_foo.generation) - gen > 1);
+}
+
/*
* This function gets called by the timer code, with HZ frequency.
* We call it with interrupts disabled.
@@ -4750,6 +4825,7 @@ void scheduler_tick(void)
struct task_struct *curr = rq->curr;
struct rq_flags rf;
unsigned long thermal_pressure;
+ bool wake_mm_gen;
arch_scale_freq_tick();
sched_clock_tick();
@@ -4763,8 +4839,13 @@ void scheduler_tick(void)
calc_global_load_tick(rq);
psi_task_tick(rq);
+ wake_mm_gen = mm_gen_tick(cpu, rq);
+
rq_unlock(rq, &rf);
+ if (wake_mm_gen)
+ mm_gen_wake();
+
perf_event_task_tick();
#ifdef CONFIG_SMP
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index bf9d8da7d35e..62fb685db8d0 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -927,6 +927,7 @@ struct rq {
unsigned int ttwu_pending;
#endif
u64 nr_switches;
+ u64 nr_mm_switches;
#ifdef CONFIG_UCLAMP_TASK
/* Utilization clamp values based on CPU's RUNNABLE tasks */
^ permalink raw reply related
* Re: [PATCH kernel v3] powerpc/pci: Remove LSI mappings on device teardown
From: Frederic Barrat @ 2020-12-02 14:17 UTC (permalink / raw)
To: Alexey Kardashevskiy, linuxppc-dev
Cc: Oliver O'Halloran, Cédric Le Goater
In-Reply-To: <20201202005222.5477-1-aik@ozlabs.ru>
On 02/12/2020 01:52, Alexey Kardashevskiy wrote:
> From: Oliver O'Halloran <oohall@gmail.com>
>
> When a passthrough IO adapter is removed from a pseries machine using hash
> MMU and the XIVE interrupt mode, the POWER hypervisor expects the guest OS
> to clear all page table entries related to the adapter. If some are still
> present, the RTAS call which isolates the PCI slot returns error 9001
> "valid outstanding translations" and the removal of the IO adapter fails.
> This is because when the PHBs are scanned, Linux maps automatically the
> INTx interrupts in the Linux interrupt number space but these are never
> removed.
>
> This problem can be fixed by adding the corresponding unmap operation when
> the device is removed. There's no pcibios_* hook for the remove case, but
> the same effect can be achieved using a bus notifier.
>
> Because INTx are shared among PHBs (and potentially across the system),
> this adds tracking of virq to unmap them only when the last user is gone.
>
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
> [aik: added refcounter]
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
Looks ok to me.
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
> Changes:
> v3:
> * free @vi on error path
>
> v2:
> * added refcounter
> ---
> arch/powerpc/kernel/pci-common.c | 82 ++++++++++++++++++++++++++++++--
> 1 file changed, 78 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
> index be108616a721..2b555997b295 100644
> --- a/arch/powerpc/kernel/pci-common.c
> +++ b/arch/powerpc/kernel/pci-common.c
> @@ -353,6 +353,55 @@ struct pci_controller *pci_find_controller_for_domain(int domain_nr)
> return NULL;
> }
>
> +struct pci_intx_virq {
> + int virq;
> + struct kref kref;
> + struct list_head list_node;
> +};
> +
> +static LIST_HEAD(intx_list);
> +static DEFINE_MUTEX(intx_mutex);
> +
> +static void ppc_pci_intx_release(struct kref *kref)
> +{
> + struct pci_intx_virq *vi = container_of(kref, struct pci_intx_virq, kref);
> +
> + list_del(&vi->list_node);
> + irq_dispose_mapping(vi->virq);
> + kfree(vi);
> +}
> +
> +static int ppc_pci_unmap_irq_line(struct notifier_block *nb,
> + unsigned long action, void *data)
> +{
> + struct pci_dev *pdev = to_pci_dev(data);
> +
> + if (action == BUS_NOTIFY_DEL_DEVICE) {
> + struct pci_intx_virq *vi;
> +
> + mutex_lock(&intx_mutex);
> + list_for_each_entry(vi, &intx_list, list_node) {
> + if (vi->virq == pdev->irq) {
> + kref_put(&vi->kref, ppc_pci_intx_release);
> + break;
> + }
> + }
> + mutex_unlock(&intx_mutex);
> + }
> +
> + return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block ppc_pci_unmap_irq_notifier = {
> + .notifier_call = ppc_pci_unmap_irq_line,
> +};
> +
> +static int ppc_pci_register_irq_notifier(void)
> +{
> + return bus_register_notifier(&pci_bus_type, &ppc_pci_unmap_irq_notifier);
> +}
> +arch_initcall(ppc_pci_register_irq_notifier);
> +
> /*
> * Reads the interrupt pin to determine if interrupt is use by card.
> * If the interrupt is used, then gets the interrupt line from the
> @@ -361,6 +410,12 @@ struct pci_controller *pci_find_controller_for_domain(int domain_nr)
> static int pci_read_irq_line(struct pci_dev *pci_dev)
> {
> int virq;
> + struct pci_intx_virq *vi, *vitmp;
> +
> + /* Preallocate vi as rewind is complex if this fails after mapping */
> + vi = kzalloc(sizeof(struct pci_intx_virq), GFP_KERNEL);
> + if (!vi)
> + return -1;
>
> pr_debug("PCI: Try to map irq for %s...\n", pci_name(pci_dev));
>
> @@ -377,12 +432,12 @@ static int pci_read_irq_line(struct pci_dev *pci_dev)
> * function.
> */
> if (pci_read_config_byte(pci_dev, PCI_INTERRUPT_PIN, &pin))
> - return -1;
> + goto error_exit;
> if (pin == 0)
> - return -1;
> + goto error_exit;
> if (pci_read_config_byte(pci_dev, PCI_INTERRUPT_LINE, &line) ||
> line == 0xff || line == 0) {
> - return -1;
> + goto error_exit;
> }
> pr_debug(" No map ! Using line %d (pin %d) from PCI config\n",
> line, pin);
> @@ -394,14 +449,33 @@ static int pci_read_irq_line(struct pci_dev *pci_dev)
>
> if (!virq) {
> pr_debug(" Failed to map !\n");
> - return -1;
> + goto error_exit;
> }
>
> pr_debug(" Mapped to linux irq %d\n", virq);
>
> pci_dev->irq = virq;
>
> + mutex_lock(&intx_mutex);
> + list_for_each_entry(vitmp, &intx_list, list_node) {
> + if (vitmp->virq == virq) {
> + kref_get(&vitmp->kref);
> + kfree(vi);
> + vi = NULL;
> + break;
> + }
> + }
> + if (vi) {
> + vi->virq = virq;
> + kref_init(&vi->kref);
> + list_add_tail(&vi->list_node, &intx_list);
> + }
> + mutex_unlock(&intx_mutex);
> +
> return 0;
> +error_exit:
> + kfree(vi);
> + return -1;
> }
>
> /*
>
^ permalink raw reply
* Re: powerpc 5.10-rcN boot failures with RCU_SCALE_TEST=m
From: Michael Ellerman @ 2020-12-02 14:03 UTC (permalink / raw)
To: Daniel Axtens, rcu, linuxppc-dev, Paul E . McKenney
In-Reply-To: <87eekfh80a.fsf@dja-thinkpad.axtens.net>
Daniel Axtens <dja@axtens.net> writes:
> Hi all,
>
> I'm having some difficulty tracking down a bug.
>
> Some configurations of the powerpc kernel since somewhere in the 5.10
> merge window fail to boot on some ppc64 systems. They hang while trying
> to bring up SMP. It seems to depend on the RCU_SCALE/PERF_TEST option.
> (It was renamed in the 5.10 merge window.)
>
> I can reproduce it as follows with qemu tcg:
>
> make -j64 pseries_le_defconfig
> scripts/config -m RCU_SCALE_TEST
> scripts/config -m RCU_PERF_TEST
> make -j 64 vmlinux CC="ccache gcc"
>
> qemu-system-ppc64 -cpu power9 -M pseries -m 1G -nographic -vga none -smp 4 -kernel vmlinux
>
> ...
> [ 0.036284][ T0] Mount-cache hash table entries: 8192 (order: 0, 65536 bytes, linear)
> [ 0.036481][ T0] Mountpoint-cache hash table entries: 8192 (order: 0, 65536 bytes, linear)
> [ 0.148168][ T1] POWER9 performance monitor hardware support registered
> [ 0.151118][ T1] rcu: Hierarchical SRCU implementation.
> [ 0.186660][ T1] smp: Bringing up secondary CPUs ...
> <hangs>
One does not simply hang :)
> I have no idea why RCU_SCALE/PERF_TEST would be causing this, but that
> seems to be what does it: if I don't set that, the kernel boots fine.
It seems to be TASKS_RCU that is the key.
I don't need RCU_SCALE_TEST enabled, I can trigger it just with the
following applied:
diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
index 0ebe15a84985..f3500c95d6a1 100644
--- a/kernel/rcu/Kconfig
+++ b/kernel/rcu/Kconfig
@@ -78,7 +78,7 @@ config TASKS_RCU_GENERIC
task-based RCU implementations. Not for manual selection.
config TASKS_RCU
- def_bool PREEMPTION
+ def_bool y
help
This option enables a task-based RCU implementation that uses
only voluntary context switch (not preemption!), idle, and
And bisect points to:
36dadef23fcc ("kprobes: Init kprobes in early_initcall")
Which moved init_kprobes() prior to SMP bringup.
For some reason when it gets stuck sysrq doesn't work, but I was able to
get it into gdb and manually call handle_sysrq('t') to get the output
below.
The SMP bringup stalls because _cpu_up() is blocked trying to take
cpu_hotplug_lock for writing:
[ 401.403132][ T0] task:swapper/0 state:D stack:12512 pid: 1 ppid: 0 flags:0x00000800
[ 401.403502][ T0] Call Trace:
[ 401.403907][ T0] [c0000000062c37d0] [c0000000062c3830] 0xc0000000062c3830 (unreliable)
[ 401.404068][ T0] [c0000000062c39b0] [c000000000019d70] __switch_to+0x2e0/0x4a0
[ 401.404189][ T0] [c0000000062c3a10] [c000000000b87228] __schedule+0x288/0x9b0
[ 401.404257][ T0] [c0000000062c3ad0] [c000000000b879b8] schedule+0x68/0x120
[ 401.404324][ T0] [c0000000062c3b00] [c000000000184ad4] percpu_down_write+0x164/0x170
[ 401.404390][ T0] [c0000000062c3b50] [c000000000116b68] _cpu_up+0x68/0x280
[ 401.404475][ T0] [c0000000062c3bb0] [c000000000116e70] cpu_up+0xf0/0x140
[ 401.404546][ T0] [c0000000062c3c30] [c00000000011776c] bringup_nonboot_cpus+0xac/0xf0
[ 401.404643][ T0] [c0000000062c3c80] [c000000000eea1b8] smp_init+0x40/0xcc
[ 401.404727][ T0] [c0000000062c3ce0] [c000000000ec43dc] kernel_init_freeable+0x1e0/0x3a0
[ 401.404799][ T0] [c0000000062c3db0] [c000000000011ec4] kernel_init+0x24/0x150
[ 401.404958][ T0] [c0000000062c3e20] [c00000000000daf0] ret_from_kernel_thread+0x5c/0x6c
It can't get it because kprobe_optimizer() has taken it for read and is now
blocked waiting for synchronize_rcu_tasks():
[ 401.418808][ T0] task:kworker/0:1 state:D stack:13392 pid: 12 ppid: 2 flags:0x00000800
[ 401.418951][ T0] Workqueue: events kprobe_optimizer
[ 401.419078][ T0] Call Trace:
[ 401.419121][ T0] [c0000000062ef650] [c0000000062ef710] 0xc0000000062ef710 (unreliable)
[ 401.419213][ T0] [c0000000062ef830] [c000000000019d70] __switch_to+0x2e0/0x4a0
[ 401.419281][ T0] [c0000000062ef890] [c000000000b87228] __schedule+0x288/0x9b0
[ 401.419347][ T0] [c0000000062ef950] [c000000000b879b8] schedule+0x68/0x120
[ 401.419415][ T0] [c0000000062ef980] [c000000000b8e664] schedule_timeout+0x2a4/0x340
[ 401.419484][ T0] [c0000000062efa80] [c000000000b894ec] wait_for_completion+0x9c/0x170
[ 401.419552][ T0] [c0000000062efae0] [c0000000001ac85c] __wait_rcu_gp+0x19c/0x210
[ 401.419619][ T0] [c0000000062efb40] [c0000000001ac90c] synchronize_rcu_tasks_generic+0x3c/0x70
[ 401.419690][ T0] [c0000000062efbe0] [c00000000022a3dc] kprobe_optimizer+0x1dc/0x470
[ 401.419757][ T0] [c0000000062efc60] [c000000000136684] process_one_work+0x2f4/0x530
[ 401.419823][ T0] [c0000000062efd20] [c000000000138d28] worker_thread+0x78/0x570
[ 401.419891][ T0] [c0000000062efdb0] [c000000000142424] kthread+0x194/0x1a0
[ 401.419976][ T0] [c0000000062efe20] [c00000000000daf0] ret_from_kernel_thread+0x5c/0x6c
But why is the synchronize_rcu_tasks() not completing?
Hopefully Paul can help there, otherwise I'll try and work out how to
dump some RCU state when it gets stuck.
Full sysrq-t output below.
cheers
[ 401.402512][ T0] sysrq: Show State
[ 401.403132][ T0] task:swapper/0 state:D stack:12512 pid: 1 ppid: 0 flags:0x00000800
[ 401.403502][ T0] Call Trace:
[ 401.403907][ T0] [c0000000062c37d0] [c0000000062c3830] 0xc0000000062c3830 (unreliable)
[ 401.404068][ T0] [c0000000062c39b0] [c000000000019d70] __switch_to+0x2e0/0x4a0
[ 401.404189][ T0] [c0000000062c3a10] [c000000000b87228] __schedule+0x288/0x9b0
[ 401.404257][ T0] [c0000000062c3ad0] [c000000000b879b8] schedule+0x68/0x120
[ 401.404324][ T0] [c0000000062c3b00] [c000000000184ad4] percpu_down_write+0x164/0x170
[ 401.404390][ T0] [c0000000062c3b50] [c000000000116b68] _cpu_up+0x68/0x280
[ 401.404475][ T0] [c0000000062c3bb0] [c000000000116e70] cpu_up+0xf0/0x140
[ 401.404546][ T0] [c0000000062c3c30] [c00000000011776c] bringup_nonboot_cpus+0xac/0xf0
[ 401.404643][ T0] [c0000000062c3c80] [c000000000eea1b8] smp_init+0x40/0xcc
[ 401.404727][ T0] [c0000000062c3ce0] [c000000000ec43dc] kernel_init_freeable+0x1e0/0x3a0
[ 401.404799][ T0] [c0000000062c3db0] [c000000000011ec4] kernel_init+0x24/0x150
[ 401.404958][ T0] [c0000000062c3e20] [c00000000000daf0] ret_from_kernel_thread+0x5c/0x6c
[ 401.405221][ T0] task:kthreadd state:S stack:13712 pid: 2 ppid: 0 flags:0x00000800
[ 401.405326][ T0] Call Trace:
[ 401.405380][ T0] [c0000000062c7a60] [c0000000062c7ac0] 0xc0000000062c7ac0 (unreliable)
[ 401.405473][ T0] [c0000000062c7c40] [c000000000019d70] __switch_to+0x2e0/0x4a0
[ 401.405565][ T0] [c0000000062c7ca0] [c000000000b87228] __schedule+0x288/0x9b0
[ 401.405639][ T0] [c0000000062c7d60] [c000000000b879b8] schedule+0x68/0x120
[ 401.405720][ T0] [c0000000062c7d90] [c000000000143508] kthreadd+0x278/0x2f0
[ 401.405798][ T0] [c0000000062c7e20] [c00000000000daf0] ret_from_kernel_thread+0x5c/0x6c
[ 401.405908][ T0] task:rcu_gp state:I stack:14576 pid: 3 ppid: 2 flags:0x00000800
[ 401.407471][ T0] Call Trace:
[ 401.407690][ T0] [c0000000062cba00] [c0000000062cba60] 0xc0000000062cba60 (unreliable)
[ 401.407851][ T0] [c0000000062cbbe0] [c000000000019d70] __switch_to+0x2e0/0x4a0
[ 401.407952][ T0] [c0000000062cbc40] [c000000000b87228] __schedule+0x288/0x9b0
[ 401.408037][ T0] [c0000000062cbd00] [c000000000b879b8] schedule+0x68/0x120
[ 401.408123][ T0] [c0000000062cbd30] [c000000000136ed4] rescuer_thread+0x2c4/0x3f0
[ 401.408268][ T0] [c0000000062cbdb0] [c000000000142424] kthread+0x194/0x1a0
[ 401.408351][ T0] [c0000000062cbe20] [c00000000000daf0] ret_from_kernel_thread+0x5c/0x6c
[ 401.408463][ T0] task:rcu_par_gp state:I stack:14624 pid: 4 ppid: 2 flags:0x00000800
[ 401.408629][ T0] Call Trace:
[ 401.408725][ T0] [c0000000062cfa00] [c0000000062cfa60] 0xc0000000062cfa60 (unreliable)
[ 401.408830][ T0] [c0000000062cfbe0] [c000000000019d70] __switch_to+0x2e0/0x4a0
[ 401.408927][ T0] [c0000000062cfc40] [c000000000b87228] __schedule+0x288/0x9b0
[ 401.409030][ T0] [c0000000062cfd00] [c000000000b879b8] schedule+0x68/0x120
[ 401.409143][ T0] [c0000000062cfd30] [c000000000136ed4] rescuer_thread+0x2c4/0x3f0
[ 401.409256][ T0] [c0000000062cfdb0] [c000000000142424] kthread+0x194/0x1a0
[ 401.409349][ T0] [c0000000062cfe20] [c00000000000daf0] ret_from_kernel_thread+0x5c/0x6c
[ 401.409458][ T0] task:kworker/0:0 state:I stack:13888 pid: 5 ppid: 2 flags:0x00000800
[ 401.409749][ T0] Workqueue: 0x0 (events)
[ 401.409923][ T0] Call Trace:
[ 401.409986][ T0] [c0000000062d39f0] [c0000000062d3a50] 0xc0000000062d3a50 (unreliable)
[ 401.410125][ T0] [c0000000062d3bd0] [c000000000019d70] __switch_to+0x2e0/0x4a0
[ 401.410263][ T0] [c0000000062d3c30] [c000000000b87228] __schedule+0x288/0x9b0
[ 401.410371][ T0] [c0000000062d3cf0] [c000000000b879b8] schedule+0x68/0x120
[ 401.410450][ T0] [c0000000062d3d20] [c000000000138dac] worker_thread+0xfc/0x570
[ 401.410567][ T0] [c0000000062d3db0] [c000000000142424] kthread+0x194/0x1a0
[ 401.410671][ T0] [c0000000062d3e20] [c00000000000daf0] ret_from_kernel_thread+0x5c/0x6c
[ 401.410795][ T0] task:kworker/0:0H state:I stack:14624 pid: 6 ppid: 2 flags:0x00000800
[ 401.411024][ T0] Call Trace:
[ 401.411117][ T0] [c0000000062d79f0] [c0000000062d7a50] 0xc0000000062d7a50 (unreliable)
[ 401.411267][ T0] [c0000000062d7bd0] [c000000000019d70] __switch_to+0x2e0/0x4a0
[ 401.411401][ T0] [c0000000062d7c30] [c000000000b87228] __schedule+0x288/0x9b0
[ 401.411484][ T0] [c0000000062d7cf0] [c000000000b879b8] schedule+0x68/0x120
[ 401.411575][ T0] [c0000000062d7d20] [c000000000138dac] worker_thread+0xfc/0x570
[ 401.411666][ T0] [c0000000062d7db0] [c000000000142424] kthread+0x194/0x1a0
[ 401.411722][ T0] [c0000000062d7e20] [c00000000000daf0] ret_from_kernel_thread+0x5c/0x6c
[ 401.411809][ T0] task:kworker/u8:0 state:I stack:14624 pid: 7 ppid: 2 flags:0x00000800
[ 401.411923][ T0] Call Trace:
[ 401.411969][ T0] [c0000000062db9f0] [c0000000062dba50] 0xc0000000062dba50 (unreliable)
[ 401.412045][ T0] [c0000000062dbbd0] [c000000000019d70] __switch_to+0x2e0/0x4a0
[ 401.412143][ T0] [c0000000062dbc30] [c000000000b87228] __schedule+0x288/0x9b0
[ 401.413324][ T0] [c0000000062dbcf0] [c000000000b879b8] schedule+0x68/0x120
[ 401.413402][ T0] [c0000000062dbd20] [c000000000138dac] worker_thread+0xfc/0x570
[ 401.413468][ T0] [c0000000062dbdb0] [c000000000142424] kthread+0x194/0x1a0
[ 401.413522][ T0] [c0000000062dbe20] [c00000000000daf0] ret_from_kernel_thread+0x5c/0x6c
[ 401.413595][ T0] task:mm_percpu_wq state:I stack:14624 pid: 8 ppid: 2 flags:0x00000800
[ 401.413699][ T0] Call Trace:
[ 401.413745][ T0] [c0000000062dfa00] [c0000000062dfa60] 0xc0000000062dfa60 (unreliable)
[ 401.413826][ T0] [c0000000062dfbe0] [c000000000019d70] __switch_to+0x2e0/0x4a0
[ 401.413894][ T0] [c0000000062dfc40] [c000000000b87228] __schedule+0x288/0x9b0
[ 401.413960][ T0] [c0000000062dfd00] [c000000000b879b8] schedule+0x68/0x120
[ 401.414025][ T0] [c0000000062dfd30] [c000000000136ed4] rescuer_thread+0x2c4/0x3f0
[ 401.414105][ T0] [c0000000062dfdb0] [c000000000142424] kthread+0x194/0x1a0
[ 401.414185][ T0] [c0000000062dfe20] [c00000000000daf0] ret_from_kernel_thread+0x5c/0x6c
[ 401.414275][ T0] task:ksoftirqd/0 state:S stack:14544 pid: 9 ppid: 2 flags:0x00000800
[ 401.414506][ T0] Call Trace:
[ 401.414729][ T0] [c0000000062e3a20] [c0000000062e3a80] 0xc0000000062e3a80 (unreliable)
[ 401.415109][ T0] [c0000000062e3c00] [c000000000019d70] __switch_to+0x2e0/0x4a0
[ 401.415651][ T0] [c0000000062e3c60] [c000000000b87228] __schedule+0x288/0x9b0
[ 401.415944][ T0] [c0000000062e3d20] [c000000000b879b8] schedule+0x68/0x120
[ 401.416044][ T0] [c0000000062e3d50] [c000000000148774] smpboot_thread_fn+0x254/0x260
[ 401.416104][ T0] [c0000000062e3db0] [c000000000142424] kthread+0x194/0x1a0
[ 401.416177][ T0] [c0000000062e3e20] [c00000000000daf0] ret_from_kernel_thread+0x5c/0x6c
[ 401.416261][ T0] task:rcu_sched state:I stack:12928 pid: 10 ppid: 2 flags:0x00000800
[ 401.416378][ T0] Call Trace:
[ 401.416423][ T0] [c0000000062e7990] [c0000000062e7a50] 0xc0000000062e7a50 (unreliable)
[ 401.416501][ T0] [c0000000062e7b70] [c000000000019d70] __switch_to+0x2e0/0x4a0
[ 401.416569][ T0] [c0000000062e7bd0] [c000000000b87228] __schedule+0x288/0x9b0
[ 401.416633][ T0] [c0000000062e7c90] [c000000000b879b8] schedule+0x68/0x120
[ 401.416705][ T0] [c0000000062e7cc0] [c0000000001b7b54] rcu_gp_kthread+0xa94/0xc00
[ 401.416798][ T0] [c0000000062e7db0] [c000000000142424] kthread+0x194/0x1a0
[ 401.416871][ T0] [c0000000062e7e20] [c00000000000daf0] ret_from_kernel_thread+0x5c/0x6c
[ 401.416965][ T0] task:migration/0 state:S stack:14496 pid: 11 ppid: 2 flags:0x00000800
[ 401.417050][ T0] Call Trace:
[ 401.417092][ T0] [c0000000062eba20] [c0000000062ebaa0] 0xc0000000062ebaa0 (unreliable)
[ 401.417206][ T0] [c0000000062ebc00] [c000000000019d70] __switch_to+0x2e0/0x4a0
[ 401.417397][ T0] [c0000000062ebc60] [c000000000b87228] __schedule+0x288/0x9b0
[ 401.417631][ T0] [c0000000062ebd20] [c000000000b879b8] schedule+0x68/0x120
[ 401.417930][ T0] [c0000000062ebd50] [c000000000148774] smpboot_thread_fn+0x254/0x260
[ 401.418251][ T0] [c0000000062ebdb0] [c000000000142424] kthread+0x194/0x1a0
[ 401.418520][ T0] [c0000000062ebe20] [c00000000000daf0] ret_from_kernel_thread+0x5c/0x6c
[ 401.418808][ T0] task:kworker/0:1 state:D stack:13392 pid: 12 ppid: 2 flags:0x00000800
[ 401.418951][ T0] Workqueue: events kprobe_optimizer
[ 401.419078][ T0] Call Trace:
[ 401.419121][ T0] [c0000000062ef650] [c0000000062ef710] 0xc0000000062ef710 (unreliable)
[ 401.419213][ T0] [c0000000062ef830] [c000000000019d70] __switch_to+0x2e0/0x4a0
[ 401.419281][ T0] [c0000000062ef890] [c000000000b87228] __schedule+0x288/0x9b0
[ 401.419347][ T0] [c0000000062ef950] [c000000000b879b8] schedule+0x68/0x120
[ 401.419415][ T0] [c0000000062ef980] [c000000000b8e664] schedule_timeout+0x2a4/0x340
[ 401.419484][ T0] [c0000000062efa80] [c000000000b894ec] wait_for_completion+0x9c/0x170
[ 401.419552][ T0] [c0000000062efae0] [c0000000001ac85c] __wait_rcu_gp+0x19c/0x210
[ 401.419619][ T0] [c0000000062efb40] [c0000000001ac90c] synchronize_rcu_tasks_generic+0x3c/0x70
[ 401.419690][ T0] [c0000000062efbe0] [c00000000022a3dc] kprobe_optimizer+0x1dc/0x470
[ 401.419757][ T0] [c0000000062efc60] [c000000000136684] process_one_work+0x2f4/0x530
[ 401.419823][ T0] [c0000000062efd20] [c000000000138d28] worker_thread+0x78/0x570
[ 401.419891][ T0] [c0000000062efdb0] [c000000000142424] kthread+0x194/0x1a0
[ 401.419976][ T0] [c0000000062efe20] [c00000000000daf0] ret_from_kernel_thread+0x5c/0x6c
[ 401.420051][ T0] task:cpuhp/0 state:S stack:14544 pid: 13 ppid: 2 flags:0x00000800
[ 401.420136][ T0] Call Trace:
[ 401.420197][ T0] [c0000000062ffa20] [c0000000062ffa80] 0xc0000000062ffa80 (unreliable)
[ 401.420342][ T0] [c0000000062ffc00] [c000000000019d70] __switch_to+0x2e0/0x4a0
[ 401.420519][ T0] [c0000000062ffc60] [c000000000b87228] __schedule+0x288/0x9b0
[ 401.420704][ T0] [c0000000062ffd20] [c000000000b879b8] schedule+0x68/0x120
[ 401.420904][ T0] [c0000000062ffd50] [c000000000148774] smpboot_thread_fn+0x254/0x260
[ 401.421134][ T0] [c0000000062ffdb0] [c000000000142424] kthread+0x194/0x1a0
[ 401.421487][ T0] [c0000000062ffe20] [c00000000000daf0] ret_from_kernel_thread+0x5c/0x6c
[ 401.421834][ T0] task:cpuhp/1 state:S stack:13584 pid: 14 ppid: 2 flags:0x00000800
[ 401.422146][ T0] Call Trace:
[ 401.422233][ T0] [c0000000063c3a20] [c0000000063c3a80] 0xc0000000063c3a80 (unreliable)
[ 401.422314][ T0] [c0000000063c3c00] [c000000000019d70] __switch_to+0x2e0/0x4a0
[ 401.422378][ T0] [c0000000063c3c60] [c000000000b87228] __schedule+0x288/0x9b0
[ 401.422444][ T0] [c0000000063c3d20] [c000000000b879b8] schedule+0x68/0x120
[ 401.422511][ T0] [c0000000063c3d50] [c000000000148774] smpboot_thread_fn+0x254/0x260
[ 401.422575][ T0] [c0000000063c3db0] [c000000000142424] kthread+0x194/0x1a0
[ 401.422658][ T0] [c0000000063c3e20] [c00000000000daf0] ret_from_kernel_thread+0x5c/0x6c
[ 401.422742][ T0] task:migration/1 state:S stack:13472 pid: 15 ppid: 2 flags:0x00000800
[ 401.422826][ T0] Call Trace:
[ 401.422873][ T0] [c0000000063c7a20] [c0000000063c7aa0] 0xc0000000063c7aa0 (unreliable)
[ 401.423195][ T0] [c0000000063c7c00] [c000000000019d70] __switch_to+0x2e0/0x4a0
[ 401.423285][ T0] [c0000000063c7c60] [c000000000b87228] __schedule+0x288/0x9b0
[ 401.423354][ T0] [c0000000063c7d20] [c000000000b879b8] schedule+0x68/0x120
[ 401.423421][ T0] [c0000000063c7d50] [c000000000148774] smpboot_thread_fn+0x254/0x260
[ 401.423486][ T0] [c0000000063c7db0] [c000000000142424] kthread+0x194/0x1a0
[ 401.423576][ T0] [c0000000063c7e20] [c00000000000daf0] ret_from_kernel_thread+0x5c/0x6c
[ 401.423783][ T0] task:ksoftirqd/1 state:S stack:14544 pid: 16 ppid: 2 flags:0x00000800
[ 401.424112][ T0] Call Trace:
[ 401.424410][ T0] [c0000000063cba20] [c0000000063cba80] 0xc0000000063cba80 (unreliable)
[ 401.424775][ T0] [c0000000063cbc00] [c000000000019d70] __switch_to+0x2e0/0x4a0
[ 401.425005][ T0] [c0000000063cbc60] [c000000000b87228] __schedule+0x288/0x9b0
[ 401.425124][ T0] [c0000000063cbd20] [c000000000b879b8] schedule+0x68/0x120
[ 401.425197][ T0] [c0000000063cbd50] [c000000000148774] smpboot_thread_fn+0x254/0x260
[ 401.425299][ T0] [c0000000063cbdb0] [c000000000142424] kthread+0x194/0x1a0
[ 401.425398][ T0] [c0000000063cbe20] [c00000000000daf0] ret_from_kernel_thread+0x5c/0x6c
[ 401.425504][ T0] task:kworker/1:0 state:I stack:14624 pid: 17 ppid: 2 flags:0x00000800
[ 401.425684][ T0] Call Trace:
[ 401.425748][ T0] [c0000000063cf9f0] [c0000000063cfa50] 0xc0000000063cfa50 (unreliable)
[ 401.425845][ T0] [c0000000063cfbd0] [c000000000019d70] __switch_to+0x2e0/0x4a0
[ 401.425916][ T0] [c0000000063cfc30] [c000000000b87228] __schedule+0x288/0x9b0
[ 401.425983][ T0] [c0000000063cfcf0] [c000000000b879b8] schedule+0x68/0x120
[ 401.426050][ T0] [c0000000063cfd20] [c000000000138dac] worker_thread+0xfc/0x570
[ 401.426123][ T0] [c0000000063cfdb0] [c000000000142424] kthread+0x194/0x1a0
[ 401.426229][ T0] [c0000000063cfe20] [c00000000000daf0] ret_from_kernel_thread+0x5c/0x6c
[ 401.426327][ T0] task:kworker/1:0H state:I stack:14320 pid: 18 ppid: 2 flags:0x00000800
[ 401.426494][ T0] Call Trace:
[ 401.426577][ T0] [c0000000063d39f0] [c0000000063d3ab0] 0xc0000000063d3ab0 (unreliable)
[ 401.426685][ T0] [c0000000063d3bd0] [c000000000019d70] __switch_to+0x2e0/0x4a0
[ 401.426772][ T0] [c0000000063d3c30] [c000000000b87228] __schedule+0x288/0x9b0
[ 401.426868][ T0] [c0000000063d3cf0] [c000000000b879b8] schedule+0x68/0x120
[ 401.426969][ T0] [c0000000063d3d20] [c000000000138dac] worker_thread+0xfc/0x570
[ 401.427082][ T0] [c0000000063d3db0] [c000000000142424] kthread+0x194/0x1a0
[ 401.427244][ T0] [c0000000063d3e20] [c00000000000daf0] ret_from_kernel_thread+0x5c/0x6c
[ 401.427403][ T0] task:kworker/0:2 state:I stack:14320 pid: 19 ppid: 2 flags:0x00000800
[ 401.427624][ T0] Workqueue: 0x0 (events)
[ 401.427768][ T0] Call Trace:
[ 401.427840][ T0] [c0000000063d79f0] [c0000000063d7ab0] 0xc0000000063d7ab0 (unreliable)
[ 401.427981][ T0] [c0000000063d7bd0] [c000000000019d70] __switch_to+0x2e0/0x4a0
[ 401.428096][ T0] [c0000000063d7c30] [c000000000b87228] __schedule+0x288/0x9b0
[ 401.428303][ T0] [c0000000063d7cf0] [c000000000b879b8] schedule+0x68/0x120
[ 401.428394][ T0] [c0000000063d7d20] [c000000000138dac] worker_thread+0xfc/0x570
[ 401.428470][ T0] [c0000000063d7db0] [c000000000142424] kthread+0x194/0x1a0
[ 401.428575][ T0] [c0000000063d7e20] [c00000000000daf0] ret_from_kernel_thread+0x5c/0x6c
[ 401.429454][ T0] Sched Debug Version: v0.11, 5.10.0-rc6-gcc-8.2.0-01356-ga1aeabd25a36-dirty #563
[ 401.429604][ T0] ktime : 383770.000000
[ 401.429683][ T0] sched_clk : 401429.227980
[ 401.429744][ T0] cpu_clk : 401429.232778
[ 401.429799][ T0] jiffies : 4294975673
[ 401.429926][ T0]
[ 401.430003][ T0] sysctl_sched
[ 401.430066][ T0] .sysctl_sched_latency : 12.000000
[ 401.430152][ T0] .sysctl_sched_min_granularity : 1.500000
[ 401.430339][ T0] .sysctl_sched_wakeup_granularity : 2.000000
[ 401.430524][ T0] .sysctl_sched_child_runs_first : 0
[ 401.430688][ T0] .sysctl_sched_features : 4139835
[ 401.430900][ T0] .sysctl_sched_tunable_scaling : 1 (logarithmic)
[ 401.431124][ T0]
[ 401.431697][ T0] cpu#0
[ 401.431766][ T0] .nr_running : 0
[ 401.431813][ T0] .nr_switches : 1055
[ 401.431865][ T0] .nr_uninterruptible : 2
[ 401.432042][ T0] .next_balance : 4294.937296
[ 401.432103][ T0] .curr->pid : 0
[ 401.432195][ T0] .clock : 401423.022270
[ 401.432313][ T0] .clock_task : 401423.022270
[ 401.432415][ T0] .avg_idle : 1000000
[ 401.432488][ T0] .max_idle_balance_cost : 500000
[ 401.432817][ T0]
[ 401.433054][ T0] cfs_rq[0]:/
[ 401.433196][ T0] .exec_clock : 0.000000
[ 401.433386][ T0] .MIN_vruntime : 0.000001
[ 401.433503][ T0] .min_vruntime : 278.095255
[ 401.433596][ T0] .max_vruntime : 0.000001
[ 401.433691][ T0] .spread : 0.000000
[ 401.433784][ T0] .spread0 : 0.000000
[ 401.433886][ T0] .nr_spread_over : 0
[ 401.433954][ T0] .nr_running : 0
[ 401.434039][ T0] .load : 0
[ 401.434127][ T0] .load_avg : 0
[ 401.434235][ T0] .runnable_avg : 0
[ 401.434341][ T0] .util_avg : 0
[ 401.434451][ T0] .util_est_enqueued : 0
[ 401.434540][ T0] .removed.load_avg : 0
[ 401.434611][ T0] .removed.util_avg : 0
[ 401.434697][ T0] .removed.runnable_avg : 0
[ 401.434811][ T0] .tg_load_avg_contrib : 0
[ 401.434902][ T0] .tg_load_avg : 0
[ 401.435203][ T0]
[ 401.435308][ T0] rt_rq[0]:
[ 401.435394][ T0] .rt_nr_running : 0
[ 401.435481][ T0] .rt_nr_migratory : 0
[ 401.435569][ T0] .rt_throttled : 0
[ 401.435678][ T0] .rt_time : 0.000000
[ 401.435772][ T0] .rt_runtime : 950.000000
[ 401.435942][ T0]
[ 401.436017][ T0] dl_rq[0]:
[ 401.436116][ T0] .dl_nr_running : 0
[ 401.436212][ T0] .dl_nr_migratory : 0
[ 401.436301][ T0] .dl_bw->bw : 996147
[ 401.436386][ T0] .dl_bw->total_bw : 0
[ 401.436476][ T0]
[ 401.436560][ T0] runnable tasks:
[ 401.436614][ T0] S task PID tree-key switches prio wait-time sum-exec sum-sleep
[ 401.436687][ T0] -------------------------------------------------------------------------------------------------------------
[ 401.436875][ T0] D swapper/0 1 84.404220 26 120 0.000000 69.398526 0.000000 0 0 /
[ 401.437357][ T0] S kthreadd 2 80.816484 18 120 0.000000 24.915098 0.000000 0 0 /
[ 401.437554][ T0] I rcu_gp 3 26.218815 2 100 0.000000 1.771584 0.000000 0 0 /
[ 401.437698][ T0] I rcu_par_gp 4 28.434004 2 100 0.000000 0.138216 0.000000 0 0 /
[ 401.437853][ T0] I kworker/0:0 5 86.357041 8 120 0.000000 7.010072 0.000000 0 0 /
[ 401.438002][ T0] I kworker/0:0H 6 32.481348 2 100 0.000000 0.097112 0.000000 0 0 /
[ 401.438144][ T0] I kworker/u8:0 7 32.635000 2 120 0.000000 0.086604 0.000000 0 0 /
[ 401.438368][ T0] I mm_percpu_wq 8 34.185643 2 100 0.000000 0.118036 0.000000 0 0 /
[ 401.438544][ T0] S ksoftirqd/0 9 36.753489 3 120 0.000000 0.617720 0.000000 0 0 /
[ 401.438686][ T0] I rcu_sched 10 79.402224 7 120 0.000000 9.592868 0.000000 0 0 /
[ 401.438890][ T0] S migration/0 11 0.100901 98 0 0.000000 40.445210 0.000000 0 0 /
[ 401.439041][ T0] D kworker/0:1 12 83.770462 4 120 0.000000 4.564404 0.000000 0 0 /
[ 401.439230][ T0] S cpuhp/0 13 54.369911 3 120 0.000000 1.278230 0.000000 0 0 /
[ 401.439412][ T0] I kworker/0:2 19 278.095255 384 120 0.000000 187.691038 0.000000 0 0 /
[ 401.439939][ T0]
[ 401.440140][ T0] cpu#1
[ 401.440250][ T0] .nr_running : 0
[ 401.440331][ T0] .nr_switches : 196
[ 401.440434][ T0] .nr_uninterruptible : 0
[ 401.440500][ T0] .next_balance : 4294.937296
[ 401.440552][ T0] .curr->pid : 0
[ 401.440631][ T0] .clock : 401422.799786
[ 401.440689][ T0] .clock_task : 401422.799786
[ 401.440777][ T0] .avg_idle : 1000000
[ 401.440865][ T0] .max_idle_balance_cost : 500000
[ 401.440945][ T0]
[ 401.441027][ T0] rt_rq[1]:
[ 401.441076][ T0] .rt_nr_running : 0
[ 401.441127][ T0] .rt_nr_migratory : 0
[ 401.441197][ T0] .rt_throttled : 0
[ 401.441255][ T0] .rt_time : 0.000000
[ 401.441315][ T0] .rt_runtime : 950.000000
[ 401.441395][ T0]
[ 401.441445][ T0] dl_rq[1]:
[ 401.441497][ T0] .dl_nr_running : 0
[ 401.441555][ T0] .dl_nr_migratory : 0
[ 401.441609][ T0] .dl_bw->bw : 996147
[ 401.441665][ T0] .dl_bw->total_bw : 0
[ 401.441717][ T0]
[ 401.441755][ T0] runnable tasks:
[ 401.441817][ T0] S task PID tree-key switches prio wait-time sum-exec sum-sleep
[ 401.441888][ T0] -------------------------------------------------------------------------------------------------------------
[ 401.441995][ T0] S cpuhp/1 14 7.177520 3 120 0.000000 11.790932 0.000000 0 0 /
[ 401.442211][ T0] S migration/1 15 0.000000 98 0 0.000000 39.188082 0.000000 0 0 /
[ 401.442383][ T0] S ksoftirqd/1 16 0.312838 3 120 0.000000 3.826106 0.000000 0 0 /
[ 401.442615][ T0] I kworker/1:0 17 -3.720346 3 120 0.000000 0.592222 0.000000 0 0 /
[ 401.442879][ T0] I kworker/1:0H 18 -4.047847 3 100 0.000000 0.211754 0.000000 0 0 /
[ 401.443037][ T0]
[ 401.443407][ T0]
[ 401.443407][ T0] Showing all locks held in the system:
[ 401.443722][ T0] 2 locks held by swapper/0/1:
[ 401.443859][ T0] #0: c000000000f6be60 (cpu_add_remove_lock){....}-{3:3}, at: cpu_up+0xcc/0x140
[ 401.444836][ T0] #1: c000000000f6bdd0 (cpu_hotplug_lock){....}-{0:0}, at: _cpu_up+0x68/0x280
[ 401.445096][ T0] 5 locks held by kworker/0:1/12:
[ 401.445223][ T0] #0: c000000006070138 ((wq_completion)events){....}-{0:0}, at: process_one_work+0x278/0x530
[ 401.445408][ T0] #1: c0000000062efcc0 ((optimizing_work).work){....}-{0:0}, at: process_one_work+0x278/0x530
[ 401.445528][ T0] #2: c00000000107de60 (kprobe_mutex){....}-{3:3}, at: kprobe_optimizer+0x50/0x470
[ 401.445610][ T0] #3: c000000000f6bdd0 (cpu_hotplug_lock){....}-{0:0}, at: kprobe_optimizer+0x58/0x470
[ 401.445746][ T0] #4: c000000000f6d018 (text_mutex){....}-{3:3}, at: kprobe_optimizer+0x70/0x470
[ 401.445895][ T0]
[ 401.445934][ T0] =============================================
[ 401.445934][ T0]
[ 401.446043][ T0] Showing busy workqueues and worker pools:
[ 401.446139][ T0] workqueue events: flags=0x0
[ 401.446275][ T0] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[ 401.446602][ T0] in-flight: 12:kprobe_optimizer
[ 401.447083][ T0] pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=0s workers=3 idle: 19 5
^ permalink raw reply related
* Re: [PATCH 5/8] powerpc/64s/powernv: ratelimit harmless HMI error printing
From: Michael Ellerman @ 2020-12-02 13:00 UTC (permalink / raw)
To: Nicholas Piggin, linuxppc-dev; +Cc: Nicholas Piggin, kvm-ppc, Mahesh Salgaonkar
In-Reply-To: <20201128070728.825934-6-npiggin@gmail.com>
Nicholas Piggin <npiggin@gmail.com> writes:
> Harmless HMI errors can be triggered by guests in some cases, and don't
> contain much useful information anyway. Ratelimit these to avoid
> flooding the console/logs.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> arch/powerpc/platforms/powernv/opal-hmi.c | 27 +++++++++++++----------
> 1 file changed, 15 insertions(+), 12 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/opal-hmi.c b/arch/powerpc/platforms/powernv/opal-hmi.c
> index 3e1f064a18db..959da6df0227 100644
> --- a/arch/powerpc/platforms/powernv/opal-hmi.c
> +++ b/arch/powerpc/platforms/powernv/opal-hmi.c
> @@ -240,19 +240,22 @@ static void print_hmi_event_info(struct OpalHMIEvent *hmi_evt)
> break;
> }
>
> - printk("%s%s Hypervisor Maintenance interrupt [%s]\n",
> - level, sevstr,
> - hmi_evt->disposition == OpalHMI_DISPOSITION_RECOVERED ?
> - "Recovered" : "Not recovered");
> - error_info = hmi_evt->type < ARRAY_SIZE(hmi_error_types) ?
> - hmi_error_types[hmi_evt->type]
> - : "Unknown";
> - printk("%s Error detail: %s\n", level, error_info);
> - printk("%s HMER: %016llx\n", level, be64_to_cpu(hmi_evt->hmer));
> - if ((hmi_evt->type == OpalHMI_ERROR_TFAC) ||
> - (hmi_evt->type == OpalHMI_ERROR_TFMR_PARITY))
> - printk("%s TFMR: %016llx\n", level,
> + if (hmi_evt->severity != OpalHMI_SEV_NO_ERROR || printk_ratelimit()) {
> + printk("%s%s Hypervisor Maintenance interrupt [%s]\n",
> + level, sevstr,
> + hmi_evt->disposition == OpalHMI_DISPOSITION_RECOVERED ?
> + "Recovered" : "Not recovered");
> + error_info = hmi_evt->type < ARRAY_SIZE(hmi_error_types) ?
> + hmi_error_types[hmi_evt->type]
> + : "Unknown";
> + printk("%s Error detail: %s\n", level, error_info);
> + printk("%s HMER: %016llx\n", level,
> + be64_to_cpu(hmi_evt->hmer));
> + if ((hmi_evt->type == OpalHMI_ERROR_TFAC) ||
> + (hmi_evt->type == OpalHMI_ERROR_TFMR_PARITY))
> + printk("%s TFMR: %016llx\n", level,
> be64_to_cpu(hmi_evt->tfmr));
> + }
Same comment RE printk_ratelimit(), I folded this in:
diff --git a/arch/powerpc/platforms/powernv/opal-hmi.c b/arch/powerpc/platforms/powernv/opal-hmi.c
index 959da6df0227..f0c1830deb51 100644
--- a/arch/powerpc/platforms/powernv/opal-hmi.c
+++ b/arch/powerpc/platforms/powernv/opal-hmi.c
@@ -213,6 +213,8 @@ static void print_hmi_event_info(struct OpalHMIEvent *hmi_evt)
"A hypervisor resource error occurred",
"CAPP recovery process is in progress",
};
+ static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
+ DEFAULT_RATELIMIT_BURST);
/* Print things out */
if (hmi_evt->version < OpalHMIEvt_V1) {
@@ -240,7 +242,7 @@ static void print_hmi_event_info(struct OpalHMIEvent *hmi_evt)
break;
}
- if (hmi_evt->severity != OpalHMI_SEV_NO_ERROR || printk_ratelimit()) {
+ if (hmi_evt->severity != OpalHMI_SEV_NO_ERROR || __ratelimit(&rs)) {
printk("%s%s Hypervisor Maintenance interrupt [%s]\n",
level, sevstr,
hmi_evt->disposition == OpalHMI_DISPOSITION_RECOVERED ?
cheers
^ permalink raw reply related
* Re: [PATCH 4/8] KVM: PPC: Book3S HV: Ratelimit machine check messages coming from guests
From: Michael Ellerman @ 2020-12-02 12:58 UTC (permalink / raw)
To: Nicholas Piggin, linuxppc-dev; +Cc: Nicholas Piggin, kvm-ppc, Mahesh Salgaonkar
In-Reply-To: <20201128070728.825934-5-npiggin@gmail.com>
Nicholas Piggin <npiggin@gmail.com> writes:
> A number of machine check exceptions are triggerable by the guest.
> Ratelimit these to avoid a guest flooding the host console and logs.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> arch/powerpc/kvm/book3s_hv.c | 11 ++++++++---
> 1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index e3b1839fc251..c94f9595133d 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -1328,8 +1328,12 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
> r = RESUME_GUEST;
> break;
> case BOOK3S_INTERRUPT_MACHINE_CHECK:
> - /* Print the MCE event to host console. */
> - machine_check_print_event_info(&vcpu->arch.mce_evt, false, true);
> + /*
> + * Print the MCE event to host console. Ratelimit so the guest
> + * can't flood the host log.
> + */
> + if (printk_ratelimit())
> + machine_check_print_event_info(&vcpu->arch.mce_evt,false, true);
You're not supposed to use printk_ratelimit(), because there's a single
rate limit state for all printks. ie. some other noisty printk() can
cause this one to never be printed.
I folded this in:
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index cbbc4f0a26fe..cfaa91b27112 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1327,12 +1327,14 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
case BOOK3S_INTERRUPT_SYSTEM_RESET:
r = RESUME_GUEST;
break;
- case BOOK3S_INTERRUPT_MACHINE_CHECK:
+ case BOOK3S_INTERRUPT_MACHINE_CHECK: {
+ static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
+ DEFAULT_RATELIMIT_BURST);
/*
* Print the MCE event to host console. Ratelimit so the guest
* can't flood the host log.
*/
- if (printk_ratelimit())
+ if (__ratelimit(&rs))
machine_check_print_event_info(&vcpu->arch.mce_evt,false, true);
/*
@@ -1361,6 +1363,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
r = RESUME_HOST;
break;
+ }
case BOOK3S_INTERRUPT_PROGRAM:
{
ulong flags;
@@ -1520,12 +1523,16 @@ static int kvmppc_handle_nested_exit(struct kvm_vcpu *vcpu)
r = RESUME_GUEST;
break;
case BOOK3S_INTERRUPT_MACHINE_CHECK:
+ {
+ static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
+ DEFAULT_RATELIMIT_BURST);
/* Pass the machine check to the L1 guest */
r = RESUME_HOST;
/* Print the MCE event to host console. */
- if (printk_ratelimit())
+ if (__ratelimit(&rs))
machine_check_print_event_info(&vcpu->arch.mce_evt, false, true);
break;
+ }
/*
* We get these next two if the guest accesses a page which it thinks
* it has mapped but which is not actually present, either because
cheers
^ permalink raw reply related
* Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option
From: Peter Zijlstra @ 2020-12-02 12:45 UTC (permalink / raw)
To: Nicholas Piggin
Cc: linux-arch, Arnd Bergmann, x86, linux-kernel, linux-mm,
Mathieu Desnoyers, linuxppc-dev
In-Reply-To: <20201202111731.GA2414@hirez.programming.kicks-ass.net>
On Wed, Dec 02, 2020 at 12:17:31PM +0100, Peter Zijlstra wrote:
> So the obvious 'improvement' here would be something like:
>
> for_each_online_cpu(cpu) {
> p = rcu_dereference(cpu_rq(cpu)->curr;
> if (p->active_mm != mm)
> continue;
> __cpumask_set_cpu(cpu, tmpmask);
> }
> on_each_cpu_mask(tmpmask, ...);
>
> The remote CPU will never switch _to_ @mm, on account of it being quite
> dead, but it is quite prone to false negatives.
>
> Consider that __schedule() sets rq->curr *before* context_switch(), this
> means we'll see next->active_mm, even though prev->active_mm might still
> be our @mm.
>
> Now, because we'll be removing the atomic ops from context_switch()'s
> active_mm swizzling, I think we can change this to something like the
> below. The hope being that the cost of the new barrier can be offset by
> the loss of the atomics.
>
> Hmm ?
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 41404afb7f4c..2597c5c0ccb0 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4509,7 +4509,6 @@ context_switch(struct rq *rq, struct task_struct *prev,
> if (!next->mm) { // to kernel
> enter_lazy_tlb(prev->active_mm, next);
>
> - next->active_mm = prev->active_mm;
> if (prev->mm) // from user
> mmgrab(prev->active_mm);
> else
> @@ -4524,6 +4523,7 @@ context_switch(struct rq *rq, struct task_struct *prev,
> * case 'prev->active_mm == next->mm' through
> * finish_task_switch()'s mmdrop().
> */
> + next->active_mm = next->mm;
> switch_mm_irqs_off(prev->active_mm, next->mm, next);
I think that next->active_mm store should be after switch_mm(),
otherwise we still race.
>
> if (!prev->mm) { // from kernel
> @@ -5713,11 +5713,9 @@ static void __sched notrace __schedule(bool preempt)
>
> if (likely(prev != next)) {
> rq->nr_switches++;
> - /*
> - * RCU users of rcu_dereference(rq->curr) may not see
> - * changes to task_struct made by pick_next_task().
> - */
> - RCU_INIT_POINTER(rq->curr, next);
> +
> + next->active_mm = prev->active_mm;
> + rcu_assign_pointer(rq->curr, next);
> /*
> * The membarrier system call requires each architecture
> * to have a full memory barrier after updating
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox