From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751524AbcCFNZv (ORCPT <rfc822;w@1wt.eu>);
	Sun, 6 Mar 2016 08:25:51 -0500
Received: from mail-pf0-f181.google.com ([209.85.192.181]:32837 "EHLO
	mail-pf0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751419AbcCFNZn (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Sun, 6 Mar 2016 08:25:43 -0500
Date: Sun, 6 Mar 2016 22:27:03 +0900
From: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: sergey.senozhatsky.work@gmail.com, akpm@linux-foundation.org,
        jack@suse.com, pmladek@suse.com, tj@kernel.org,
        linux-kernel@vger.kernel.org, sergey.senozhatsky@gmail.com,
        jack@suse.cz
Subject: Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
Message-ID: <20160306132703.GA927@swordfish>
References: <1457175338-1665-1-git-send-email-sergey.senozhatsky@gmail.com>
 <1457175338-1665-2-git-send-email-sergey.senozhatsky@gmail.com>
 <20160306063251.GA493@swordfish>
 <201603061618.GED43232.MtOQOFSLOFHJFV@I-love.SAKURA.ne.jp>
 <20160306093530.GA26055@swordfish>
 <201603062006.IJD17667.OOQFLtMVHOFSJF@I-love.SAKURA.ne.jp>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <201603062006.IJD17667.OOQFLtMVHOFSJF@I-love.SAKURA.ne.jp>
User-Agent: Mutt/1.5.24 (2015-08-30)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On (03/06/16 20:06), Tetsuo Handa wrote:
[..]
> > do you mean a new worker allocation delay and a MAYDAY timer delay?
> > 
> 
> I don't know what MAYDAY is. I'm talking about a situation where printing_work
> work item is not processed (i.e. printing_work_func() is not called) until
> current work item calls schedule_timeout_*().
> 
> We had a problem that since vmstat_work work item was using system_wq,
> vmstat_work work item was not processed (i.e. vmstat_update() was not called) if
> kworker was looping inside memory allocator without calling schedule_timeout_*()
> due to disk_events_workfn() doing GFP_NOIO allocation).

hm, just for note, none of system-wide wqs seem to have a ->rescuer thread
(WQ_MEM_RECLAIM).

[..]
> Even if you use printk_wq with WQ_MEM_RECLAIM for printing_work work item,
> printing_work_func() will not be called until current work item calls
> schedule_timeout_*(). That will be an undesirable random delay. If you use
> a dedicated kernel thread rather than a dedicated workqueue with WQ_MEM_RECLAIM,
> we can avoid this random delay.

hm. yes, seems that it may take some time until workqueue wakeup() a ->rescuer thread.
need to look more.

[..]
> > 	console_lock();
> > 	for (...) {
> > 		do_foo() {
> > 			...
> > 				pr_err(" ... foo message ...\n");
> > 			...
> > 		}
> > 	}
> > 	console_unlock();
> > 
> > then yes, nothing will be printed until that process executes console_unlock(),
> > because it's console_unlock() that pushes the messages to console drivers.
> 
> Yes, I meant a sequence like
> 
>   console_lock();
>   ptr = kmalloc(GFP_KERNEL);
>   kfree(ptr);
>   console_unlock();
> 
> and kmalloc() prints OOM killer messages rather than failing that allocation.
> Are we sure that there is no such usage?

such usage is quite possible.

problems that I have with console_lock()/console_unlock() is that
these functions serve a double purpose: exclusive printk() lock and a
console_drivers list lock.

**** I haven't really thought about it yet, but I want to split it. ****

console_lock()/console_unlock() can be executed by user space processes
(inside system calls). for example:

SyS_open()
 ...
  chrdev_open()
   tty_open()
    console_device()
     console_lock()
     console_unlock()
      for (;;) {
       call_console_drivers()
      }

or doing `cat /proc/consoles`

SyS_read()
 vfs_read()
  proc_reg_read()
   seq_read()
    c_stop()
     console_unlock()
      for (;;) {
       call_console_drivers()
      }

which can introduce two nasty problems:

1) console_lock() may put user space process in TASK_UNINTERRUPTIBLE for
   unknown period of time -- until current console_sem owner will not
   finish print loop in console_unlock(). no signals, heartbeats, etc.
   will be processed by this user space process.

2) user space process may have to spend an unknown period of time in
   console_unlock() later, pushing "3rd party" kernel messages to
   console drivers. again, not really good.

(kthreads can suffer here too, sure).


in the examples above, a process just wanted to iterate the console_drivers
list in read access mode. so, for instance, in

struct tty_driver *console_device(int *index)
{
        struct console *c;
        struct tty_driver *driver = NULL;

        console_lock();
        for_each_console(c) {
                if (!c->device)
                        continue;
                driver = c->device(c, index);
                if (driver)
                        break;
        }
        console_unlock();
        return driver;
}

instead of console_lock()/console_unlock()->call_console_drivers()
it could do (very schematically)

	read_lock_console();
          for_each_console(c) {
                  if (!c->device)
                          continue;
                  driver = c->device(c, index);
                  if (driver)
                          break;
          }
	read_unlock_console();

and in functions that modify the list, the lock can be acquired in
write mode. example,

int unregister_console(struct console *console)
{
        write_lock_console();
...
                for (a=console_drivers->next, b=console_drivers ;
                     a; b=a, a=b->next) {
                        if (a == console) {
                                b->next = a->next;
                                res = 0;
                                break;
                        }
                }
        }
...
        if (console_drivers != NULL && console->flags & CON_CONSDEV)
                console_drivers->flags |= CON_CONSDEV;

        console->flags &= ~CON_ENABLED;
        write_unlock_unlock();


printk(), thus, will take its own "another" exclusive lock, to guarantee
that only one process can call_console_drivers(), and it will take the
console_drivers list lock in read mode. so other process(-es) that also
want to access console_drivers list in read mode will not wait in
TASK_UNINTERRUPTIBLE.

of course, this means that console_unlock() now does not print
anything to console drivers. it's printk() duty to do this.

and the last statement can be very hard to sell; because I don't know
for sure if there is a code in the kernel that depends on the fact that
console_lock() stops printk() and that console_unlock() prints all of
the printk messages.


that "lock split" also can fix another *theoretical* scenario: suppose,
that for some reason someone's setup has both a) huge number of printk()
calls and b) a relatively huge number of console_lock()/console_unlock()
calls, all happening simultaneously. while async printk helps in (a),
case (b) "detours" the printk async code; and if cpus at lest some of the
cpus that do console_lock()/console_unlock() also disable IRQs or
preemption, then lockups and all the bad things are very much likely.


just an idea.

	-ss