From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751172AbcGMLPP (ORCPT <rfc822;w@1wt.eu>);
	Wed, 13 Jul 2016 07:15:15 -0400
Received: from mx2.suse.de ([195.135.220.15]:51267 "EHLO mx2.suse.de"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751114AbcGMLPI (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Wed, 13 Jul 2016 07:15:08 -0400
Date: Wed, 13 Jul 2016 13:14:25 +0200
From: Petr Mladek <pmladek@suse.com>
To: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>, Jan Kara <jack@suse.com>,
        Tejun Heo <tj@kernel.org>,
        Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
        linux-kernel@vger.kernel.org, Byungchul Park <byungchul.park@lge.com>,
        Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Subject: Re: [PATCH v12 0/3] printk: Make printk() completely async
Message-ID: <20160713111425.GG32373@pathway.suse.cz>
References: <20160513131848.2087-1-sergey.senozhatsky@gmail.com>
 <20160629050835.GA730@swordfish>
 <20160712162809.GE32373@pathway.suse.cz>
 <20160713074242.GC563@swordfish>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160713074242.GC563@swordfish>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Wed 2016-07-13 16:42:42, Sergey Senozhatsky wrote:
> Hello,
> 
> On (07/12/16 18:28), Petr Mladek wrote:
> > I have tried the following:
> > 
> >        int a,b;
> > 
> >        for (a = 0; a < 1000; a++) {
> >                for (b = 0; b < 10; b++) {
> >                        pr_cont("item%04d", a*10 + b);
> >                }
> >                pr_cont("\n");
> >        }
> > 
> > 
> > [ 4913.000298] item8740item8741item8742item8743item8744item8745item8746item8747item8748item8749
> > [ 4913.000310] item8750item8751item8752item8753item8754item8755item8756item8757item8758item8759
> > [ 4913.000327] item8760item8761item8762item8763item8764item8765item8766item8767item8768item8769
> > [ 4913.000342] item8770item8771item8772item8773item8774item8775item8776item8777item8778item8779
> > [ 4913.000356] item8780
> > [ 4913.000357] item8781
> > [ 4913.000358] item8782
> > [...]
> 
> hm.. so are there any 'concurrent' printk()-s coming from other CPUs
> that are not getting printed on the console (because of loglevel restrictions),
> but still screw up the cont buffer?.... otherwise, my expectation was that in
> this particular case cpu issues a new pr_cont() only after it has printed
> the current message via call_console_drivers()->write(). so partially flushed
> cont buffer was not really expected to happen. I was wrong, obviously.

To be honest. I am not 100% sure what happens here. One theory is
that the printk kthread is waken because of some previous
non-continuous message. It steals console_lock() and partially flushes
the cont buffer. In this case, the pr_cont() calls are not able to get
console_lock() and basically work in the async mode. The pr_cont()
calls have to store each piece sepatately because the partially
flushed cont buffer is blocked until flushed completely.

Hmm, the strange thing is that I see this problem even when I force
the global synch more by

    echo Y >  /sys/module/printk/parameters/synchronous


I need to dig more into it.

> > Another solution would be to remember cont.cons outside
> > of struct cont. Then the cont buffer could be reused
> > immediately.
> 
> just an idea.
> ... or try to make KERN_CONT SMP-safe. there are many pr_cont() call
> sites. ACPI is one notable example. the others include OOM, some cgroup
> related output (or... was it memcg), etc., etc.
> 
> so we *may be* can have a per-cpu cont buffer and add new API
> pr_cont_begin()/pr_cont_end(), that would disable preemption.
> 
> 
> +	pr_cont_begin() /* preempt_disable() */
> 
> 	for (.....)
> 		pr_cont("%pS ....);
> 
> +	pr_cont_end() /* preempt_enable() */
>
> pr_cont_end() also can flush this CPU's cont buffer and store the log
> line. this will probably break very long cont lines (not sure if we
> have any of those though). and may be flush_on_panic() would have to
> do some extra work iterating each cpu.

It would work but I am a bit scared by the complexity. I think
that we should find a compromise between complexity and
reliability.

> > The saved position will be used either for the actual
> > cont buffer or for the very first message in the ring
> > buffer. The situation might be detected by comparing
> > console_seq, log_next_seq, and log_first_seq.
> 
> we still can have pr_cont() happening on several cpus simultaneously.
> console_seq is getting reset, when we register a new CON_PRINTBUFFER
> console.

I agree that mixing part of lines from different processes/cpus
is not ideal. But it is not much worse than mixing whole lines.
We should do a best effort but we do not need to be perfect.

I continue with scratching my head.

Best Regards,
Petr