* Re: Serial console is causing system lock-up
From: John Ogness @ 2019-03-07 14:21 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Petr Mladek, Nigel Croxon, Theodore Y. Ts'o,
Sergey Senozhatsky, Greg Kroah-Hartman, Steven Rostedt, dm-devel,
Mikulas Patocka, linux-serial
In-Reply-To: <20190307122642.GA10415@tigerII.localdomain>
On 2019-03-07, Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote:
>>> The CPU which spins on prb_lock() can have preemption disabled and,
>>> additionally, can have local IRQs disabled, or be under RCU read
>>> side lock. If consoles are busy, then there are CPUs which printk()
>>> data and keep prb_lock contended; prb_lock() does not seem to be
>>> fair. What am I missing?
>>
>> You are correct. Making prb_lock fair might be something we want to
>> look into. Perhaps also based on the loglevel of what needs to be
>> printed. (For example, KERN_ALERT always wins over KERN_CRIT.)
>
> Good.
>
> I'm not insisting, but I have a feeling that touching watchdogs after
> call_console_drivers() might be too late, sometimes. When we spin in
> prb_lock() we wait for all CPUs which are before/ahead of us to
> finish their call_console_drivers(), one by one. So if CPUZ is very
> unlucky and is in atomic context, then prb_lock() for that CPUZ can
> last for N * call_console_drivers(). And depending on N (which also
> includes unfairness) and call_console_drivers() timings NMI watchdog
> may pay CPUZ a visit before it gets its chance to touch watchdogs.
>
> *May be* sometimes we might want to touch watchdogs in prb_lock().
This is an excellent point. The handling of the watchdogs needs to be
more carefully considered/placed.
> So, given the design of new printk(), I can't help thinking about the
> fact that current
> "the winner takes it all"
> may become
> "the winner waits for all".
I am open to looking at implementing a fair prb_cpulock(). I think
without it, your observation for these "multi-CPU emergency message
flood" cases is correct.
> [...]
>
> And this brings us to another pessimistic scenario: a very unlucky
> CPUZ has to spin in prb_lock() waiting for other CPUs to print out
> the very same 2 million chars. Which in terms of printk() latency
> looks to me just like current printk.
If the prb_cpulock() is fair, they are both taking turns printing. The
prb_cpulock() is taken for _each_ printk() call and not (like in the
current printk) for the complete unprinted contents of the ringbuffer.
> John, sorry to ask this, does new printk() design always provide
> latency guarantees good enough for PREEMPT_RT?
Yes, because it is assumed that emergency messages will never occur for
a correctly running system. We've run tests where we pr_info() from all
contexts and the console_loglevel is set such that KERN_INFO is printed
to the console. The storage of the messages into the printk ringbuffer
do not cause any unacceptable latencies. And printing those messages to
the consoles via the fully preemptible printk-kthread also causes no
unacceptable latencies.
Obviously as soon as any emergency message appears, an _unacceptable_
latency occurs. But that is considered OK because the system is no
longer running correctly and it is worth the price to pay to get those
messages with such high reliability.
In a previous response[0] to Petr, I talk about defining _all_ console
printing as emergency and requiring userspace to handle informational
messages. With that definition, one could argue that the atomic consoles
are enough and we could avoid the printk-kthread(s) altogether. I am not
absolutely against this idea. But there (most likely) will be consoles
that cannot support atomic. And how should they be handled? We could
keep the current (quite complex) implementation in place just for
them. But I really wonder if all that is necessary just for those (few?)
consoles. printk-kthreads seem to me to be the ideal solution
(particularly when dealing with PREEMPT_RT, where nearly everything
important becomes a kthread). My RFC series implements a very simple
(naive) approach to kthread printing. I believe there is a lot[1][2] we
can do to make the printk-kthread printing more reliable.
John Ogness
[0] https://lkml.kernel.org/r/87k1hbbq2m.fsf@linutronix.de
[1] https://lkml.kernel.org/r/87o96p9gtx.fsf@linutronix.de
[2] https://lkml.kernel.org/r/87lg1rggcz.fsf@linutronix.de
^ permalink raw reply
* Re: Serial console is causing system lock-up
From: John Stoffel @ 2019-03-07 14:08 UTC (permalink / raw)
To: John Ogness
Cc: Petr Mladek, Nigel Croxon, Theodore Y. Ts'o,
Greg Kroah-Hartman, Steven Rostedt, Sergey Senozhatsky, dm-devel,
Mikulas Patocka, linux-serial
In-Reply-To: <87ftrzbp3y.fsf@linutronix.de>
>>>>> "John" == John Ogness <john.ogness@linutronix.de> writes:
John> On 2019-03-06, Steven Rostedt <rostedt@goodmis.org> wrote:
>>> This bug only happens if we select large logbuffer (millions of
>>> characters). With smaller log buffer, there are messages "** X printk
>>> messages dropped", but there's no lockup.
>>>
>>> The kernel apparently puts 2 million characters into a console log
>>> buffer, then takes some lock and than tries to write all of them to a
>>> slow serial line.
>>>
>>> [...]
>>>
>>> The MD-RAID is supposed to recalculate data for the corrupted device
>>> and bring it back to life. However, scrubbing the MD-RAID device
>>> resulted in a lot of reads from the device with bad checksums, these
>>> were reported to the log and killed the machine.
>>>
>>> I made a patch to dm-integrity to rate-limit the error messages. But
>>> anyway - killing the machine in case of too many log messages seems
>>> bad. If the log messages are produced faster than the kernel can
>>> write them, the kernel should discard some of them, not kill itself.
>>
>> Sounds like another aurgment for the new printk design.
John> Assuming the bad checksum messages are considered an emergency
John> (for example, at least loglevel KERN_WARN), then the new printk
John> design would print those messages synchronously to the slow
John> serial line in the context of the driver as the driver is
John> producing them.
John> There wouldn't be a lock-up, but it would definitely slow down
John> the driver. The situation of "messages being produced faster
John> than the kernel can write them" would never exist because the
John> printk() call will only return after the writing is completed. I
John> am curious if that would be acceptable here?
The real problem is the disconnect between serial console speed and
capacity in bits/sec and that of the regular console. Serial, esp at
9600 baud is just a slow and limited resource which needs to be
handled differently than a graphical console.
I'm also big on ratelimiting messages, even critical warning
messages. Too much redundant info doesn't help anyone. And what a
subsystem thinks is critical, may not be critical to the system as a
whole.
In this case, if these checksum messages are telling us that there's
corruption, why isn't dm-integrity going readonly and making the block
device get the filesystem to also go readonly and to stop the damage
right away?
If it's just a warning for the niceness, then please rate limit them,
or summarize them in some more useful way. Or even log them to
somewhere else than the console once the problem is noted.
John
^ permalink raw reply
* Re: Serial console is causing system lock-up
From: Mikulas Patocka @ 2019-03-07 13:12 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Petr Mladek, Nigel Croxon, Greg Kroah-Hartman, Steven Rostedt,
Sergey Senozhatsky, dm-devel, linux-serial
In-Reply-To: <20190307015652.GA4893@jagdpanzerIV>
On Thu, 7 Mar 2019, Sergey Senozhatsky wrote:
> On (03/06/19 11:07), Mikulas Patocka wrote:
> > This bug only happens if we select large logbuffer (millions of
> > characters). With smaller log buffer, there are messages "** X printk
> > messages dropped", but there's no lockup.
> >
> > The kernel apparently puts 2 million characters into a console log buffer,
> > then takes some lock and than tries to write all of them to a slow serial
> > line.
>
> Do you run a preemtible kernel?
>
> -ss
The kernel has voluntary preemption.
Mikulas
^ permalink raw reply
* Re: Serial console is causing system lock-up
From: Mikulas Patocka @ 2019-03-07 12:54 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Petr Mladek, Nigel Croxon, Theodore Y. Ts'o,
Sergey Senozhatsky, John Ogness, Greg Kroah-Hartman,
Steven Rostedt, dm-devel, linux-serial
In-Reply-To: <20190307122642.GA10415@tigerII.localdomain>
On Thu, 7 Mar 2019, Sergey Senozhatsky wrote:
> On (03/07/19 11:37), John Ogness wrote:
> > > Sorry John, the reasoning is that I'm trying to understand
> > > why this does not look like soft or hard lock-up or RCU stall
> > > scenario.
> >
> > The reason is that you are seeing data being printed on the console. The
> > watchdogs (soft, hard, rcu, nmi) are all touched with each emergency
> > message.
>
> Correct. Please see below.
>
> > > The CPU which spins on prb_lock() can have preemption disabled and,
> > > additionally, can have local IRQs disabled, or be under RCU read
> > > side lock. If consoles are busy, then there are CPUs which printk()
> > > data and keep prb_lock contended; prb_lock() does not seem to be
> > > fair. What am I missing?
> >
> > You are correct. Making prb_lock fair might be something we want to look
> > into. Perhaps also based on the loglevel of what needs to be
> > printed. (For example, KERN_ALERT always wins over KERN_CRIT.)
>
> Good.
>
> I'm not insisting, but I have a feeling that touching watchdogs after
> call_console_drivers() might be too late, sometimes. When we spin in
There are still NMI stacktraces - see here
http://people.redhat.com/~mpatocka/testcases/console-lockup/
> prb_lock() we wait for all CPUs which are before/ahead of us to
> finish their call_console_drivers(), one by one. So if CPUZ is very
> unlucky and is in atomic context, then prb_lock() for that CPUZ can
> last for N * call_console_drivers(). And depending on N (which also
> includes unfairness) and call_console_drivers() timings NMI watchdog
> may pay CPUZ a visit before it gets its chance to touch watchdogs.
>
> *May be* sometimes we might want to touch watchdogs in prb_lock().
>
> So, given the design of new printk(), I can't help thinking about the
> fact that current
> "the winner takes it all"
> may become
> "the winner waits for all".
>
> Mikulas mentioned that he observes "** X messages dropped" warnings.
> And this suggests that, _most likely_, we had significantly more that
> 2 CPUs calling printk() concurrently.
When I observe these messages (usually with small log buffer size), it
doesn't lockup.
The lockups happen because the messages are stuffed into a 2MiB buffer and
then printed over 115200 baud serial line.
You can see this:
http://people.redhat.com/~mpatocka/testcases/console-lockup/5.0-total-lockup.txt
Here it attempted to write 1355277 bytes over the slow serial line - it
takes a few minutes.
> - A single source - single CPU calling printk() - would not lose messages,
> because it would print its own message before it printk() another one (we
> still could have another CPU rescheduling under console_sem, but I don't
> think this is the case).
>
> - Two CPUs would also probably not lose messages, Steven's console_owner
> would throttle them down.
>
> So I think what we have was a spike of WARN/ERR printk-s comming from
> N CPUs concurrently.
Losing messages is in my opinion reasonable (if they are produced faster
than they could be printed). Another possibility is to always write the
message synchronously and exit printk only after it is written.
> And this brings us to another pessimistic scenario: a very unlucky
> CPUZ has to spin in prb_lock() waiting for other CPUs to print out
> the very same 2 million chars. Which in terms of printk() latency
> looks to me just like current printk.
>
> John, sorry to ask this, does new printk() design always provide
> latency guarantees good enough for PREEMPT_RT?
>
> I'm surely missing something. Where am I wrong?
>
> -ss
Mikulas
^ permalink raw reply
* Re: [RFC PATCH v1 08/25] printk: add ring buffer and kthread
From: Petr Mladek @ 2019-03-07 12:50 UTC (permalink / raw)
To: John Ogness
Cc: Sergey Senozhatsky, linux-kernel, Peter Zijlstra, Steven Rostedt,
Daniel Wang, Andrew Morton, Linus Torvalds, Greg Kroah-Hartman,
Alan Cox, Jiri Slaby, Peter Feiner, linux-serial,
Sergey Senozhatsky
In-Reply-To: <87r2bjbt47.fsf@linutronix.de>
On Wed 2019-03-06 22:17:12, John Ogness wrote:
> On 2019-03-06, Petr Mladek <pmladek@suse.com> wrote:
> >> I would like to clarify that message supression (i.e. console loglevel)
> >> is a method of reducing what is printed. It does nothing to address the
> >> issues related to console printing. My proposal focusses on addressing
> >> the issues related to console printing.
> >>
> >> Console printing is a convenient feature to allow a kernel to
> >> communicate information to a user without any reliance on
> >> userspace. IMHO there are 2 categories of messages that the kernel will
> >> communicate. The first is informational (usb events, wireless and
> >> ethernet connectivity, filesystem events, etc.). Since this category of
> >> messages occurs during normal runtime, we should expect that it does not
> >> cause adverse effects to the rest of the system (such as latencies and
> >> non-deterministic behavior).
> >>
> >> The second category is for emergency situations, where the kernel needs
> >> to report something unusual (panic, BUG, WARN, etc.). In some of these
> >> situations, it may be the last thing the kernel ever does. We should
> >> expect this category to focus on getting the message out as reliably as
> >> possible. Even if it means disturbing the system with large latencies.
> >>
> >> _Both_ categories are important for the user, but their requirements are
> >> different:
> >>
> >> informational: non-disturbing
> >> emergency: reliable
> >
> > Isn't this already handled by the console_level?
>
> You mean that the current console level is being used to set the
> boundary between emergency and informational messages? Definitely no!
> Take any Linux distribution and look at their default console_level
> setting. Even the kernel code defaults to a value of 7!
CONSOLE_LOGLEVEL_DEFAULT is one thing. The real life is another thing.
I could hardly imagine any distribution that would scary users by all
kernel messages.
For example, SUSE distro boots with "quiet" kernel parameter.
It seems that RedHat does the same as suggested by
https://lkml.kernel.org/r/20180619115726.3098-1-hdegoede@redhat.com
It means that the level '4' is typically used and even lower number
is wanted.
Well, SUSE (or systemd folks?) go even further. rsyslog sets
the console_loglevel to "1".
BTW: The default has been set to "7" in kernel-1.3.71 (1996-01-01).
There is no explanation for this. Well, it looks like a pre-SMP
era. printk() was not serialized with any lock. Also messages
to tty were written separately by tty_write_message(). I guess
the the default was primary for serial console and developers.
> My proposal is making this distinction clearer: a significant increase
> in reliability for emergency messages.
This is true if we fulfil several conditions. I am not sure that
the following list is complete but:
+ The commonly used consoles must provide the atomic write.
Otherwise, it would have only few users.
+ The printk code and console atomic write must be really
safe and maintainable.
+ The serialization of atomic consoles must not cause more
problems than the current approach (always slow pr_err() vs.
random printk() slow, unfair locking) [*]
+ The mixed order of directly printed and postponed messages
must not cause too much confusion. We might need some
userpace tools to sort them.
[*] I agree that the direct console handling has several advantages:
+ Avoid all the problems with console_lock.
+ It is more predictable (no random victims spending
time with handling all messages).
+ It is a natural throttling of heavy printk() users.
Less amount of lost messages (important ones never).
Might reduce (remove) the problem with softlockups.
I am just not able to predict eventual problems. There was a reason
for the current state. People did not want always slow printk().
> Why is that a problem? The focus is reliabilty. We are talking about
> emergency messages here. Messages that should never occur for a
> correctly functioning system. It does not matter if the entire system
> slows down because of it.
This sounds too idealistic:
+ It expects that everyone is using the log levels consistently and
with care. But many people believe that their messages are
the most important ones.
+ Also it expects that all users use similar console_logleves
in all situations. But it depends on the usecase. Sometimes
warning and error messages do not make much sense without
the context, e.g. info or even debug messages.
> > If it gets blocked for some reasons (nobody is perfect) it would
> > block all the other serialized CPUs as well.
>
> Yes, blocking in an atomic context would be bad for any code.
>
> > In each case, we really need to be careful about the complexity.
> > printk() is already complex enough. It might be acceptable if
> > it makes the design cleaner and less tangled. printk() would
> > deserve a redesign.
>
> It is my belief that I am significantly simplifying printk because there
> are no more exotic contexts and situations. Emergency messages are
> atomic and immediate. Context does not matter. Informational messages
> are printed fully preemptible, so console drivers are free to do
> whatever magic they want to do. Do you see that as more complex than the
> current implementation of safe buffers, defers, hand-offs, exclusive
> consoles, and cond_rescheds?
This expects that we would be able to always offload all
non-emergency messages. I am not convinced that it is acceptable.
And I wrote this several times.
Also, please, keep the log buffer and console handling separate.
The new "lockless" log buffer solves problems that are currently
being solved by printk_safe code. While the atomic consoles
and kthread tries to solve problems with unpredictable
printk() cost (softlockups) and make the emergency messages
more reliable (lost messages).
The complexity depends on how the final code would look like.
It might be win if it is more straightforward and organized.
But there are some alarming things:
+ lockless code is always tricky
+ another console_loglevel setting is added
+ one more way to push messages to the console
+ messages are slit into 3 groups:
+ one group was subset of the other in the past
+ newly, two groups need to get joined and sorted
to get the original one
Best Regards,
Petr
^ permalink raw reply
* Re: Serial console is causing system lock-up
From: Sergey Senozhatsky @ 2019-03-07 12:26 UTC (permalink / raw)
To: John Ogness
Cc: Petr Mladek, Nigel Croxon, Theodore Y. Ts'o,
Sergey Senozhatsky, Greg Kroah-Hartman, Steven Rostedt,
Sergey Senozhatsky, dm-devel, Mikulas Patocka, linux-serial
In-Reply-To: <87o96nezr2.fsf@linutronix.de>
On (03/07/19 11:37), John Ogness wrote:
> > Sorry John, the reasoning is that I'm trying to understand
> > why this does not look like soft or hard lock-up or RCU stall
> > scenario.
>
> The reason is that you are seeing data being printed on the console. The
> watchdogs (soft, hard, rcu, nmi) are all touched with each emergency
> message.
Correct. Please see below.
> > The CPU which spins on prb_lock() can have preemption disabled and,
> > additionally, can have local IRQs disabled, or be under RCU read
> > side lock. If consoles are busy, then there are CPUs which printk()
> > data and keep prb_lock contended; prb_lock() does not seem to be
> > fair. What am I missing?
>
> You are correct. Making prb_lock fair might be something we want to look
> into. Perhaps also based on the loglevel of what needs to be
> printed. (For example, KERN_ALERT always wins over KERN_CRIT.)
Good.
I'm not insisting, but I have a feeling that touching watchdogs after
call_console_drivers() might be too late, sometimes. When we spin in
prb_lock() we wait for all CPUs which are before/ahead of us to
finish their call_console_drivers(), one by one. So if CPUZ is very
unlucky and is in atomic context, then prb_lock() for that CPUZ can
last for N * call_console_drivers(). And depending on N (which also
includes unfairness) and call_console_drivers() timings NMI watchdog
may pay CPUZ a visit before it gets its chance to touch watchdogs.
*May be* sometimes we might want to touch watchdogs in prb_lock().
So, given the design of new printk(), I can't help thinking about the
fact that current
"the winner takes it all"
may become
"the winner waits for all".
Mikulas mentioned that he observes "** X messages dropped" warnings.
And this suggests that, _most likely_, we had significantly more that
2 CPUs calling printk() concurrently.
- A single source - single CPU calling printk() - would not lose messages,
because it would print its own message before it printk() another one (we
still could have another CPU rescheduling under console_sem, but I don't
think this is the case).
- Two CPUs would also probably not lose messages, Steven's console_owner
would throttle them down.
So I think what we have was a spike of WARN/ERR printk-s comming from
N CPUs concurrently.
And this brings us to another pessimistic scenario: a very unlucky
CPUZ has to spin in prb_lock() waiting for other CPUs to print out
the very same 2 million chars. Which in terms of printk() latency
looks to me just like current printk.
John, sorry to ask this, does new printk() design always provide
latency guarantees good enough for PREEMPT_RT?
I'm surely missing something. Where am I wrong?
-ss
^ permalink raw reply
* Re: [RFC PATCH v1 08/25] printk: add ring buffer and kthread
From: John Ogness @ 2019-03-07 12:06 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: linux-kernel, Peter Zijlstra, Petr Mladek, Steven Rostedt,
Daniel Wang, Andrew Morton, Linus Torvalds, Greg Kroah-Hartman,
Alan Cox, Jiri Slaby, Peter Feiner, linux-serial,
Sergey Senozhatsky
In-Reply-To: <20190304073856.GA552@jagdpanzerIV>
On 2019-03-04, Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:
>> + /* the printk kthread never exits */
>> + for (;;) {
>> + ret = prb_iter_wait_next(&iter, buf,
>> + PRINTK_RECORD_MAX, &master_seq);
>> + if (ret == -ERESTARTSYS) {
>> + continue;
>> + } else if (ret < 0) {
>> + /* iterator invalid, start over */
>> + prb_iter_init(&iter, &printk_rb, NULL);
>> + continue;
>> + }
>> +
>> + msg = (struct printk_log *)buf;
>> + format_text(msg, master_seq, ext_text, &ext_len, text,
>> + &len, printk_time);
>> +
>> + console_lock();
>> + if (len > 0 || ext_len > 0) {
>> + call_console_drivers(ext_text, ext_len, text, len);
>> + boot_delay_msec(msg->level);
>> + printk_delay();
>> + }
>> + console_unlock();
>> + }
>
> This, theoretically, creates a whole new world of possibilities for
> console drivers. Now they can do GFP_KERNEL allocations and stall
> printk_kthread during OOM; or they can explicitly reschedule from
> ->write() callback (via console_conditional_schedule()) because
> console_lock() sets console_may_schedule.
This was the intention. Although, as I mentioned in a previous
response[0], perhaps we should not loosen the requirements on write().
> It's one thing to do cond_resched() (or to let preemption to take
> over) after call_console_drivers() (when we are done printing a
> message to all console drivers) and another thing to let preemption to
> take over while we are printing a messages to the consoles. It
> probably would make sense to disable preemption around
> call_console_drivers().
I could see disabling preemption and interrupts for emergency messages
in the printk-kthread in order to synchronize against an irq_work
secondary printer as suggested in my response[0]. But I don't see an
advantage to disabling preemption in general for
call_console_drivers(). It is exactly that disable_preempt() that is so
harmful for realtime tasks.
John Ogness
[0] https://lkml.kernel.org/r/87lg1rggcz.fsf@linutronix.de
^ permalink raw reply
* Re: [PATCH] serial: max310x: Use struct_size() in devm_kzalloc()
From: Jan Kundrát @ 2019-03-07 11:03 UTC (permalink / raw)
To: Gustavo A. R. Silva
Cc: Greg Kroah-Hartman, Jiri Slaby, linux-serial, linux-kernel
In-Reply-To: <20190104213913.GA21248@embeddedor>
On pátek 4. ledna 2019 22:39:13 CET, Gustavo A. R. Silva wrote:
> One of the more common cases of allocation size calculations is finding
> the size of a structure that has a zero-sized array at the end, along
> with memory for some number of elements for that array. For example:
>
> struct foo {
> int stuff;
> void *entry[];
> };
>
> instance = devm_kzalloc(dev, sizeof(struct foo) + sizeof(void
> *) * count, GFP_KERNEL);
>
> Instead of leaving these open-coded and prone to type mistakes, we can
> now use the new struct_size() helper:
>
> instance = devm_kzalloc(dev, struct_size(instance, entry,
> count), GFP_KERNEL);
>
> This code was detected with the help of Coccinelle.
>
> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
> ---
> drivers/tty/serial/max310x.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/tty/serial/max310x.c b/drivers/tty/serial/max310x.c
> index 4f479841769a..1d700a8ab517 100644
> --- a/drivers/tty/serial/max310x.c
> +++ b/drivers/tty/serial/max310x.c
> @@ -1197,8 +1197,7 @@ static int max310x_probe(struct device
> *dev, struct max310x_devtype *devtype,
> return PTR_ERR(regmap);
>
> /* Alloc port structure */
> - s = devm_kzalloc(dev, sizeof(*s) +
> - sizeof(struct max310x_one) * devtype->nr, GFP_KERNEL);
> + s = devm_kzalloc(dev, struct_size(s, p, devtype->nr), GFP_KERNEL);
> if (!s) {
> dev_err(dev, "Error allocating port structure\n");
> return -ENOMEM;
Reviewed-by: Jan Kundrát <jan.kundrat@cesnet.cz>
Tested-by: Jan Kundrát <jan.kundrat@cesnet.cz>
^ permalink raw reply
* Re: Serial console is causing system lock-up
From: John Ogness @ 2019-03-07 10:37 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Petr Mladek, Nigel Croxon, Theodore Y. Ts'o,
Greg Kroah-Hartman, Steven Rostedt, Sergey Senozhatsky, dm-devel,
Mikulas Patocka, linux-serial
In-Reply-To: <20190307091748.GA6307@jagdpanzerIV>
On 2019-03-07, Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:
>>>> When the console is constantly printing messages, I wouldn't say
>>>> that looks like a lock-up scenario. It looks like the system is
>>>> busy printing critical information to the console (which it is).
>>>
>>> What if we have N tasks/CPUs calling printk() simultaneously?
>>
>> Then they take turns printing their messages to the console, spinning
>> until they get their turn. This still is not and does not look like a
>> lock-up. But I think you already know this, so I don't understand the
>> reasoning behind asking the question. Maybe you could clarify what
>> you are getting at.
>
> Sorry John, the reasoning is that I'm trying to understand
> why this does not look like soft or hard lock-up or RCU stall
> scenario.
The reason is that you are seeing data being printed on the console. The
watchdogs (soft, hard, rcu, nmi) are all touched with each emergency
message.
> The CPU which spins on prb_lock() can have preemption disabled and,
> additionally, can have local IRQs disabled, or be under RCU read
> side lock. If consoles are busy, then there are CPUs which printk()
> data and keep prb_lock contended; prb_lock() does not seem to be
> fair. What am I missing?
You are correct. Making prb_lock fair might be something we want to look
into. Perhaps also based on the loglevel of what needs to be
printed. (For example, KERN_ALERT always wins over KERN_CRIT.)
> You probably talk about the case when all
> printing CPUs are in preemptible contexts (assumingly this is what
> is happening in dm-integrity case) so they can spin on prb_lock(),
> that's OK. The case I'm talking about is - what if we have the same
> situation, but then one of the CPUs printk()-s from !preemptible.
> Does this make sense?
Yes, you are referring to a worst case. We could have local_irqs
disabled on every CPU while every CPU is hit with an NMI and all those
NMIs want to dump a load of messages. The rest of the system will be
frozen until those NMI printers can finish. But that is still not a
lock-up. At some point those printers should finish and eventually the
system should be able to resume.
John Ogness
^ permalink raw reply
* Re: [RFC PATCH v1 00/25] printk: new implementation
From: John Ogness @ 2019-03-07 9:53 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: linux-kernel, Peter Zijlstra, Petr Mladek, Steven Rostedt,
Daniel Wang, Andrew Morton, Linus Torvalds, Greg Kroah-Hartman,
Alan Cox, Jiri Slaby, Peter Feiner, linux-serial,
Sergey Senozhatsky
In-Reply-To: <20190304052335.GA6648@jagdpanzerIV>
On 2019-03-04, Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:
>>>> - A dedicated kernel thread is created for printing to all consoles
>>>> in a fully preemptible context.
>>>
>>> How do you handle sysrq-<foo> printouts on systems which can't
>>> schedule printk-kthread?
>>
>> If those sysrq printouts are at the emergency loglevel (which most
>> are), then they are printed immediately to the emergency
>> consoles. This has already proved useful for our own kernel debugging
>> work. For example, currently sysrq-z for very large traces result in
>> messages being dropped because of printk buffer overflows. But with
>> the emergency console we always see the full trace buffer.
>
> Are we sure that all systems will always have ->atomic console(-s)
> enabled? Is it possible to convert all console drivers to ->atomic?
> fbcon, for instance (with scrolling and font scaling, etc)?
No, I am not sure if we can convert all console drivers to atomic
consoles. But I think if we don't have to fear disturbing the system,
the possibilities for such an implementation are greater.
> If there are setups which can be fully !atomic (in terms of console
> output) then we, essentially, have a fully preemptible kthread printk
> implementation.
Correct. I've mentioned in another response[0] some ideas about what
could be done to aid this.
I understand that fully preemptible kthread printing is unacceptable for
you. Since all current console drivers are already irq safe, I'm
wondering if using irq_work to handle the emergency printing for console
drivers without write_atomic() would help. (If the printk caller is in a
context that write() supports, then write() could be called directly.)
This would also demand that the irq-safe requirements for write() are
not relaxed. The printk-kthread might still be faster than irq_work, but
it might increase reliability if an irq_work is triggered as an extra
precaution.
John Ogness
[0] https://lkml.kernel.org/r/87o96p9gtx.fsf@linutronix.de
^ permalink raw reply
* Re: Serial console is causing system lock-up
From: Sergey Senozhatsky @ 2019-03-07 9:17 UTC (permalink / raw)
To: John Ogness
Cc: Petr Mladek, Nigel Croxon, Theodore Y. Ts'o,
Sergey Senozhatsky, Greg Kroah-Hartman, Steven Rostedt,
Sergey Senozhatsky, dm-devel, Mikulas Patocka, linux-serial
In-Reply-To: <87pnr3hyle.fsf@linutronix.de>
On (03/07/19 09:34), John Ogness wrote:
> >> When the console is constantly printing messages, I wouldn't say that
> >> looks like a lock-up scenario. It looks like the system is busy
> >> printing critical information to the console (which it is).
> >
> > What if we have N tasks/CPUs calling printk() simultaneously?
>
> Then they take turns printing their messages to the console, spinning
> until they get their turn. This still is not and does not look like a
> lock-up. But I think you already know this, so I don't understand the
> reasoning behind asking the question. Maybe you could clarify what you
> are getting at.
Sorry John, the reasoning is that I'm trying to understand
why this does not look like soft or hard lock-up or RCU stall
scenario.
The CPU which spins on prb_lock() can have preemption disabled and,
additionally, can have local IRQs disabled, or be under RCU read
side lock. If consoles are busy, then there are CPUs which printk()
data and keep prb_lock contended; prb_lock() does not seem to be
fair. What am I missing? You probably talk about the case when all
printing CPUs are in preemptible contexts (assumingly this is what
is happening in dm-integrity case) so they can spin on prb_lock(),
that's OK. The case I'm talking about is - what if we have the same
situation, but then one of the CPUs printk()-s from !preemptible.
Does this make sense?
-ss
^ permalink raw reply
* Re: Serial console is causing system lock-up
From: John Ogness @ 2019-03-07 8:34 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Petr Mladek, Nigel Croxon, Theodore Y. Ts'o,
Greg Kroah-Hartman, Steven Rostedt, Sergey Senozhatsky, dm-devel,
Mikulas Patocka, linux-serial
In-Reply-To: <20190307082509.GA1925@jagdpanzerIV>
On 2019-03-07, Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:
>>> If the serial console is a really slow one and we have a CPU in
>>> atomic context spinnig on prb_lock, while the prb_lock is always
>>> acquired by other CPUs, then it may look like a lock-up scenario.
>>
>> When the console is constantly printing messages, I wouldn't say that
>> looks like a lock-up scenario. It looks like the system is busy
>> printing critical information to the console (which it is).
>
> What if we have N tasks/CPUs calling printk() simultaneously?
Then they take turns printing their messages to the console, spinning
until they get their turn. This still is not and does not look like a
lock-up. But I think you already know this, so I don't understand the
reasoning behind asking the question. Maybe you could clarify what you
are getting at.
John Ogness
^ permalink raw reply
* Re: Serial console is causing system lock-up
From: Sergey Senozhatsky @ 2019-03-07 8:25 UTC (permalink / raw)
To: John Ogness
Cc: Petr Mladek, Nigel Croxon, Theodore Y. Ts'o,
Sergey Senozhatsky, Greg Kroah-Hartman, Steven Rostedt,
Sergey Senozhatsky, dm-devel, Mikulas Patocka, linux-serial
In-Reply-To: <87tvgfhzd6.fsf@linutronix.de>
On (03/07/19 09:17), John Ogness wrote:
> > If the serial console is a really slow one and we have a CPU in atomic
> > context spinnig on prb_lock, while the prb_lock is always acquired by
> > other CPUs, then it may look like a lock-up scenario.
>
> When the console is constantly printing messages, I wouldn't say that
> looks like a lock-up scenario. It looks like the system is busy printing
> critical information to the console (which it is).
What if we have N tasks/CPUs calling printk() simultaneously?
-ss
^ permalink raw reply
* Re: Serial console is causing system lock-up
From: John Ogness @ 2019-03-07 8:17 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Petr Mladek, Nigel Croxon, Theodore Y. Ts'o,
Greg Kroah-Hartman, Steven Rostedt, Sergey Senozhatsky, dm-devel,
Mikulas Patocka, linux-serial
In-Reply-To: <20190307022254.GB4893@jagdpanzerIV>
On 2019-03-07, Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:
>>> Sounds like another aurgment for the new printk design.
>>
>> Assuming the bad checksum messages are considered an emergency (for
>> example, at least loglevel KERN_WARN), then the new printk design
>> would print those messages synchronously to the slow serial line in
>> the context of the driver as the driver is producing them.
>
> I wonder if we have several CPU doing the checksum verification.
> New sync printk mechanism has a Big-Konsole-spin-Lock, all CPUs that
> want to printk() should line up and wait, in whatever context they
> currently are.
>
>> There wouldn't be a lock-up, but it would definitely slow down the
>> driver. The situation of "messages being produced faster than the
>> kernel can write them" would never exist because the printk() call
>> will only return after the writing is completed.
>
> If the serial console is a really slow one and we have a CPU in atomic
> context spinnig on prb_lock, while the prb_lock is always acquired by
> other CPUs, then it may look like a lock-up scenario.
When the console is constantly printing messages, I wouldn't say that
looks like a lock-up scenario. It looks like the system is busy printing
critical information to the console (which it is).
John Ogness
^ permalink raw reply
* Re: [RFC PATCH v1 19/25] printk: introduce emergency messages
From: Sergey Senozhatsky @ 2019-03-07 7:30 UTC (permalink / raw)
To: John Ogness
Cc: linux-kernel, Peter Zijlstra, Petr Mladek, Sergey Senozhatsky,
Steven Rostedt, Daniel Wang, Andrew Morton, Linus Torvalds,
Greg Kroah-Hartman, Alan Cox, Jiri Slaby, Peter Feiner,
linux-serial, Sergey Senozhatsky
In-Reply-To: <20190212143003.48446-20-john.ogness@linutronix.de>
On (02/12/19 15:29), John Ogness wrote:
[..]
> +static bool console_can_emergency(int level)
> +{
> + struct console *con;
> +
> + for_each_console(con) {
> + if (!(con->flags & CON_ENABLED))
> + continue;
> + if (con->write_atomic && level < emergency_console_loglevel)
> + return true;
> + if (con->write && (con->flags & CON_BOOT))
> + return true;
> + }
> + return false;
> +}
> +
> +static void call_emergency_console_drivers(int level, const char *text,
> + size_t text_len)
> +{
> + struct console *con;
> +
> + for_each_console(con) {
> + if (!(con->flags & CON_ENABLED))
> + continue;
> + if (con->write_atomic && level < emergency_console_loglevel) {
> + con->write_atomic(con, text, text_len);
> + continue;
> + }
> + if (con->write && (con->flags & CON_BOOT)) {
> + con->write(con, text, text_len);
> + continue;
> + }
> + }
> +}
> +
> +static void printk_emergency(char *buffer, int level, u64 ts_nsec, u16 cpu,
> + char *text, u16 text_len)
> +{
> + struct printk_log msg;
> + size_t prefix_len;
> +
> + if (!console_can_emergency(level))
> + return;
> +
> + msg.level = level;
> + msg.ts_nsec = ts_nsec;
> + msg.cpu = cpu;
> + msg.facility = 0;
> +
> + /* "text" must have PREFIX_MAX preceding bytes available */
> +
> + prefix_len = print_prefix(&msg,
> + console_msg_format & MSG_FORMAT_SYSLOG,
> + printk_time, buffer);
> + /* move the prefix forward to the beginning of the message text */
> + text -= prefix_len;
> + memmove(text, buffer, prefix_len);
> + text_len += prefix_len;
> +
> + text[text_len++] = '\n';
> +
> + call_emergency_console_drivers(level, text, text_len);
So this iterates the console list and calls consoles' callbacks, but what
prevents console driver to be rmmod-ed under us?
CPU0 CPU1
printk_emergency() rmmod netcon
call_emergency_console_drivers()
con_foo->flags & CON_ENABLED == 1
unregister_console(con_foo)
con_foo->flags &= ~CON_ENABLED
__exit // con_foo gone ?
con_foo->write()
We use console_lock()/console_trylock() in order to protect the list and
console drivers; but this brings scheduler to the picture, with all its
locks.
Or am I missing something?
-ss
^ permalink raw reply
* Re: [RFC PATCH v1 08/25] printk: add ring buffer and kthread
From: Sergey Senozhatsky @ 2019-03-07 6:51 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: John Ogness, Petr Mladek, Sergey Senozhatsky, linux-kernel,
Peter Zijlstra, Steven Rostedt, Daniel Wang, Andrew Morton,
Linus Torvalds, Greg Kroah-Hartman, Alan Cox, Jiri Slaby,
Peter Feiner, linux-serial
In-Reply-To: <20190307064152.GA425@jagdpanzerIV>
On (03/07/19 15:41), Sergey Senozhatsky wrote:
> This sounds concerning. IMHO, netconsole is too important to rely
> on preemptible printk and scheduler. Especially those netcons which
> run in "report only when oops_in_progress" mode. Sometimes netconsole
> is the fastest console available, but preemptible printk may turn it
> into the slowest one.
+ oops may end by the time kthread becomes active.
-ss
^ permalink raw reply
* Re: [RFC PATCH v1 08/25] printk: add ring buffer and kthread
From: Sergey Senozhatsky @ 2019-03-07 6:41 UTC (permalink / raw)
To: John Ogness
Cc: Petr Mladek, Sergey Senozhatsky, linux-kernel, Peter Zijlstra,
Steven Rostedt, Daniel Wang, Andrew Morton, Linus Torvalds,
Greg Kroah-Hartman, Alan Cox, Jiri Slaby, Peter Feiner,
linux-serial, Sergey Senozhatsky
In-Reply-To: <87k1hbbq2m.fsf@linutronix.de>
On (03/06/19 23:22), John Ogness wrote:
> On 2019-03-06, Petr Mladek <pmladek@suse.com> wrote:
> >> _Both_ categories are important for the user, but their requirements
> >> are different:
> >>
> >> informational: non-disturbing
> >> emergency: reliable
> >
> > Isn't this already handled by the console_level?
> >
> > The informational messages can be reliably read via syslog, /dev/kmsg.
> > They are related to the normal works when the system works well.
> >
> > The emergency messages (errors, warnings) are printed in emergency
> > situations. They are printed as reliably as possible to the console
> > because the userspace might not be reliable enough.
>
> I've never viewed console_level this way. _If_ console_level really is
> supposed to define the emergency/informational boundary, all
> informational messages are supposed to be handled by userspace, and
> console printing's main objective is reliability... then I would change
> my proposal such that:
OK, you guys are ahead of me.
FB folks want to have a per-console sysfs knob to dynamically adjust
loglevel on each console. The use case is to temporarily set loglevel
to, say, debug on fast consoles, gather some data/logs, set loglevel
back to less verbose afterwards.
Preserving the existing loglevel behaviour looks right to me.
> - if a console supports write_atomic(), _all_ console printing for that
> console would use write_atomic()
Sounds right.
But Big-Konsole-Lock looks suspicious.
> - only consoles without write_atomic() will be printing via the
> printk-kthread(s)
>
> IMO, for consoles with write_atomic(), this would increase reliability
> over the current mainline implementation. It would also simplify
> write_atomic() implementations because they would no longer need to
> synchronize against write().
[..]
> For those consoles that cannot implement write_atomic() (vt and
> netconsole come to mind), or as a transition period until remaining
> console drivers have implemented write_atomic(), these would use the
> "fallback" of printing fully preemptively in their own kthread using
> write().
This sounds concerning. IMHO, netconsole is too important to rely
on preemptible printk and scheduler. Especially those netcons which
run in "report only when oops_in_progress" mode. Sometimes netconsole
is the fastest console available, but preemptible printk may turn it
into the slowest one.
-ss
^ permalink raw reply
* Re: [RFC PATCH v1 08/25] printk: add ring buffer and kthread
From: Sergey Senozhatsky @ 2019-03-07 5:15 UTC (permalink / raw)
To: John Ogness
Cc: Sergey Senozhatsky, linux-kernel, Peter Zijlstra, Petr Mladek,
Steven Rostedt, Daniel Wang, Andrew Morton, Linus Torvalds,
Greg Kroah-Hartman, Alan Cox, Jiri Slaby, Peter Feiner,
linux-serial, Sergey Senozhatsky
In-Reply-To: <87o96p9gtx.fsf@linutronix.de>
Hi John,
On (03/05/19 22:00), John Ogness wrote:
> Hi Sergey,
>
[..]
> Console printing is a convenient feature to allow a kernel to
> communicate information to a user without any reliance on
> userspace. IMHO there are 2 categories of messages that the kernel will
> communicate. The first is informational (usb events, wireless and
> ethernet connectivity, filesystem events, etc.). Since this category of
> messages occurs during normal runtime, we should expect that it does not
> cause adverse effects to the rest of the system (such as latencies and
> non-deterministic behavior).
>
> The second category is for emergency situations, where the kernel needs
> to report something unusual (panic, BUG, WARN, etc.). In some of these
> situations, it may be the last thing the kernel ever does. We should
> expect this category to focus on getting the message out as reliably as
> possible. Even if it means disturbing the system with large latencies.
>
> _Both_ categories are important for the user, but their requirements are
> different:
>
> informational: non-disturbing
> emergency: reliable
That's one way of looking at this. And it's reasonable.
Another way could be:
- anything that passes the loglevel check (suppress_message_printing())
is considered to be important
- anything else is just "noise" which should be suppressed. This
is what loglevel and suppress_message_printing() are for - to tell
the kernel what we want and what we don't want to be on the consoles.
> But what if can't be implemented? vt console, for example? Yes, the vt
> console would be tricky. It doesn't even support the current
> bust_spinlocks/oops_in_progress. But since the emergency category has a
> clear requirement (reliability)
"Reliability" - yes; the existence of emergency messages - no.
"to report something unusual (panic, BUG, WARN, etc.). In some of
these situations, it may be the last thing the kernel ever does."
But so may be the "informational" message. For example, not all ARCHs
sport NMI to detect and warn about a lockup/deadlock somewhere in usb
or wifi. The "informational" can be the last thing the kernel has to
say.
> The current printk implementation will do a better job of getting the
> informational messages out, but at an enormous cost to all the tasks
> on the system (including the realtime tasks). I am proposing a printk
> implementation where the tasks are not affected by console printing
> floods.
In new printk design the tasks are still affected by printing floods.
Tasks have to line up and (busy) wait for each other, regardless of
contexts.
One of the late patch sets which I had (I never ever published it) was
a different kind of printk-kthread offloading. The idea was that whatever
should be printed (suppress_message_printing()) should be printed. We
obviously can't loop in console_unlock() for ever and there is only one
way to figure out if we can print out more messages, that's why printk
became RCU stall detector and watchdog aware; and printk would break
out and wake up printk_kthread if it sees that watchdog is about to get
angry on that particular CPU. printk_kthread would run with preemption
disabled and do the same thing: if it spent watchdog_threshold / 2
printing - breakout, enable local IRQ, cond_resched(). IOW watchdogs
determine how much time we can spend on printing.
[..]
> I want messages of the information category to cause no disturbance to
> the system. Give the kernel the freedom to communicate to users without
> destroying its own performance. This can only be achieved if the
> messages are printed from a _fully_ preemptible context.
[..]
> And I want messages of the emergency category to be as reliable as
> possible, regardless of the costs to the system. Give the kernel a
> clear mechanism to _reliably_ communicate critical information.
> Such messages should never appear on a correctly functioning system.
I don't really understand the role of loglevel anymore.
When I do ./a.out --loglevel=X I have a clear understanding that
all messages which fall into [critical, X] range will be in the logs,
because I told that application that those messages are important to
me right now. And it used to be the same with the kernel loglevel.
But now the kernel will do its own thing:
- what the kernel considers important will go into the logs
- what the kernel doesn't consider important _maybe_ will end up
in the logs (preemptible printk kthread). And this is where
loglevel now. After the _maybe_ part.
If I'm not mistaken, Tetsuo reported that on a box under heavy OOM
pressure he saw preemptible printk dragging 5 minutes behind the
logbuf head. Preemptible printk is good for nothing. It's beyond
useless, it's something else.
-ss
^ permalink raw reply
* Re: Serial console is causing system lock-up
From: Sergey Senozhatsky @ 2019-03-07 2:22 UTC (permalink / raw)
To: John Ogness
Cc: Petr Mladek, Nigel Croxon, Theodore Y. Ts'o,
Greg Kroah-Hartman, Steven Rostedt, Sergey Senozhatsky, dm-devel,
Mikulas Patocka, linux-serial
In-Reply-To: <87ftrzbp3y.fsf@linutronix.de>
On (03/06/19 23:43), John Ogness wrote:
> > Sounds like another aurgment for the new printk design.
>
> Assuming the bad checksum messages are considered an emergency (for
> example, at least loglevel KERN_WARN), then the new printk design would
> print those messages synchronously to the slow serial line in the
> context of the driver as the driver is producing them.
I wonder if we have several CPU doing the checksum verification.
New sync printk mechanism has a Big-Konsole-spin-Lock, all CPUs that
want to printk() should line up and wait, in whatever context they
currently are.
> There wouldn't be a lock-up, but it would definitely slow down the
> driver. The situation of "messages being produced faster than the kernel
> can write them" would never exist because the printk() call will only
> return after the writing is completed.
If the serial console is a really slow one and we have a CPU in atomic
context spinnig on prb_lock, while the prb_lock is always acquired by
other CPUs, then it may look like a lock-up scenario.
-ss
^ permalink raw reply
* Re: [RFC PATCH v1 02/25] printk-rb: add prb locking functions
From: Sergey Senozhatsky @ 2019-03-07 2:12 UTC (permalink / raw)
To: John Ogness
Cc: linux-kernel, Peter Zijlstra, Petr Mladek, Sergey Senozhatsky,
Steven Rostedt, Daniel Wang, Andrew Morton, Linus Torvalds,
Greg Kroah-Hartman, Alan Cox, Jiri Slaby, Peter Feiner,
linux-serial, Sergey Senozhatsky
In-Reply-To: <20190212143003.48446-3-john.ogness@linutronix.de>
On (02/12/19 15:29), John Ogness wrote:
> +static bool __prb_trylock(struct prb_cpulock *cpu_lock,
> + unsigned int *cpu_store)
> +{
> + unsigned long *flags;
> + unsigned int cpu;
> +
> + cpu = get_cpu();
> +
> + *cpu_store = atomic_read(&cpu_lock->owner);
> + /* memory barrier to ensure the current lock owner is visible */
> + smp_rmb();
> + if (*cpu_store == -1) {
> + flags = per_cpu_ptr(cpu_lock->irqflags, cpu);
> + local_irq_save(*flags);
> + if (atomic_try_cmpxchg_acquire(&cpu_lock->owner,
> + cpu_store, cpu)) {
> + return true;
> + }
> + local_irq_restore(*flags);
> + } else if (*cpu_store == cpu) {
> + return true;
> + }
> +
> + put_cpu();
> + return false;
> +}
> +
> +/*
> + * prb_lock: Perform a processor-reentrant spin lock.
> + * @cpu_lock: A pointer to the lock object.
> + * @cpu_store: A "flags" pointer to store lock status information.
> + *
> + * If no processor has the lock, the calling processor takes the lock and
> + * becomes the owner. If the calling processor is already the owner of the
> + * lock, this function succeeds immediately. If lock is locked by another
> + * processor, this function spins until the calling processor becomes the
> + * owner.
> + *
> + * It is safe to call this function from any context and state.
> + */
> +void prb_lock(struct prb_cpulock *cpu_lock, unsigned int *cpu_store)
> +{
> + for (;;) {
> + if (__prb_trylock(cpu_lock, cpu_store))
> + break;
> + cpu_relax();
> + }
> +}
Any chance to make it more fair? A ticket based lock, perhaps?
-ss
^ permalink raw reply
* Re: Serial console is causing system lock-up
From: Sergey Senozhatsky @ 2019-03-07 1:56 UTC (permalink / raw)
To: Mikulas Patocka
Cc: Petr Mladek, Nigel Croxon, Greg Kroah-Hartman, Steven Rostedt,
Sergey Senozhatsky, dm-devel, linux-serial
In-Reply-To: <alpine.LRH.2.02.1903061031420.16905@file01.intranet.prod.int.rdu2.redhat.com>
On (03/06/19 11:07), Mikulas Patocka wrote:
> This bug only happens if we select large logbuffer (millions of
> characters). With smaller log buffer, there are messages "** X printk
> messages dropped", but there's no lockup.
>
> The kernel apparently puts 2 million characters into a console log buffer,
> then takes some lock and than tries to write all of them to a slow serial
> line.
Do you run a preemtible kernel?
-ss
^ permalink raw reply
* [PATCH v11 4/4] serial: 8250-mtk: modify uart DMA rx
From: Long Cheng @ 2019-03-07 1:45 UTC (permalink / raw)
To: Vinod Koul, Randy Dunlap, Rob Herring, Mark Rutland, Ryder Lee,
Sean Wang, Nicolas Boichat, Matthias Brugger
Cc: Dan Williams, Greg Kroah-Hartman, Jiri Slaby, Sean Wang,
dmaengine, devicetree, linux-arm-kernel, linux-mediatek,
linux-kernel, linux-serial, srv_heupstream, Yingjoe Chen, YT Shen,
Zhenbao Liu, Long Cheng
In-Reply-To: <1551923135-32479-1-git-send-email-long.cheng@mediatek.com>
Modify uart rx and complete for DMA.
Signed-off-by: Long Cheng <long.cheng@mediatek.com>
---
drivers/tty/serial/8250/8250_mtk.c | 53 ++++++++++++++++--------------------
1 file changed, 23 insertions(+), 30 deletions(-)
diff --git a/drivers/tty/serial/8250/8250_mtk.c b/drivers/tty/serial/8250/8250_mtk.c
index e2c4076..f9b2fd5 100644
--- a/drivers/tty/serial/8250/8250_mtk.c
+++ b/drivers/tty/serial/8250/8250_mtk.c
@@ -30,7 +30,6 @@
#define MTK_UART_DMA_EN_TX 0x2
#define MTK_UART_DMA_EN_RX 0x5
-#define MTK_UART_TX_SIZE UART_XMIT_SIZE
#define MTK_UART_RX_SIZE 0x8000
#define MTK_UART_TX_TRIGGER 1
#define MTK_UART_RX_TRIGGER MTK_UART_RX_SIZE
@@ -64,28 +63,30 @@ static void mtk8250_dma_rx_complete(void *param)
struct mtk8250_data *data = up->port.private_data;
struct tty_port *tty_port = &up->port.state->port;
struct dma_tx_state state;
+ int copied, cnt, tmp;
unsigned char *ptr;
- int copied;
- dma_sync_single_for_cpu(dma->rxchan->device->dev, dma->rx_addr,
- dma->rx_size, DMA_FROM_DEVICE);
+ if (data->rx_status == DMA_RX_SHUTDOWN)
+ return;
dmaengine_tx_status(dma->rxchan, dma->rx_cookie, &state);
+ cnt = dma->rx_size - state.residue;
+ tmp = cnt;
- if (data->rx_status == DMA_RX_SHUTDOWN)
- return;
+ if ((data->rx_pos + cnt) > dma->rx_size)
+ tmp = dma->rx_size - data->rx_pos;
- if ((data->rx_pos + state.residue) <= dma->rx_size) {
- ptr = (unsigned char *)(data->rx_pos + dma->rx_buf);
- copied = tty_insert_flip_string(tty_port, ptr, state.residue);
- } else {
- ptr = (unsigned char *)(data->rx_pos + dma->rx_buf);
- copied = tty_insert_flip_string(tty_port, ptr,
- dma->rx_size - data->rx_pos);
+ ptr = (unsigned char *)(data->rx_pos + dma->rx_buf);
+ copied = tty_insert_flip_string(tty_port, ptr, tmp);
+ data->rx_pos += tmp;
+
+ if (cnt > tmp) {
ptr = (unsigned char *)(dma->rx_buf);
- copied += tty_insert_flip_string(tty_port, ptr,
- data->rx_pos + state.residue - dma->rx_size);
+ tmp = cnt - tmp;
+ copied += tty_insert_flip_string(tty_port, ptr, tmp);
+ data->rx_pos = tmp;
}
+
up->port.icount.rx += copied;
tty_flip_buffer_push(tty_port);
@@ -96,9 +97,7 @@ static void mtk8250_dma_rx_complete(void *param)
static void mtk8250_rx_dma(struct uart_8250_port *up)
{
struct uart_8250_dma *dma = up->dma;
- struct mtk8250_data *data = up->port.private_data;
struct dma_async_tx_descriptor *desc;
- struct dma_tx_state state;
desc = dmaengine_prep_slave_single(dma->rxchan, dma->rx_addr,
dma->rx_size, DMA_DEV_TO_MEM,
@@ -113,12 +112,6 @@ static void mtk8250_rx_dma(struct uart_8250_port *up)
dma->rx_cookie = dmaengine_submit(desc);
- dmaengine_tx_status(dma->rxchan, dma->rx_cookie, &state);
- data->rx_pos = state.residue;
-
- dma_sync_single_for_device(dma->rxchan->device->dev, dma->rx_addr,
- dma->rx_size, DMA_FROM_DEVICE);
-
dma_async_issue_pending(dma->rxchan);
}
@@ -131,13 +124,13 @@ static void mtk8250_dma_enable(struct uart_8250_port *up)
if (data->rx_status != DMA_RX_START)
return;
- dma->rxconf.direction = DMA_DEV_TO_MEM;
- dma->rxconf.src_addr_width = dma->rx_size / 1024;
- dma->rxconf.src_addr = dma->rx_addr;
+ dma->rxconf.direction = DMA_DEV_TO_MEM;
+ dma->rxconf.src_port_window_size = dma->rx_size;
+ dma->rxconf.src_addr = dma->rx_addr;
- dma->txconf.direction = DMA_MEM_TO_DEV;
- dma->txconf.dst_addr_width = MTK_UART_TX_SIZE / 1024;
- dma->txconf.dst_addr = dma->tx_addr;
+ dma->txconf.direction = DMA_MEM_TO_DEV;
+ dma->txconf.dst_port_window_size = UART_XMIT_SIZE;
+ dma->txconf.dst_addr = dma->tx_addr;
serial_out(up, UART_FCR, UART_FCR_ENABLE_FIFO | UART_FCR_CLEAR_RCVR |
UART_FCR_CLEAR_XMIT);
@@ -217,7 +210,7 @@ static void mtk8250_shutdown(struct uart_port *port)
* Mediatek UARTs use an extra highspeed register (UART_MTK_HIGHS)
*
* We need to recalcualte the quot register, as the claculation depends
- * on the vaule in the highspeed register.
+ * on the value in the highspeed register.
*
* Some baudrates are not supported by the chip, so we use the next
* lower rate supported and update termios c_flag.
--
1.7.9.5
^ permalink raw reply related
* [PATCH v11 3/4] dt-bindings: dma: uart: rename binding
From: Long Cheng @ 2019-03-07 1:45 UTC (permalink / raw)
To: Vinod Koul, Randy Dunlap, Rob Herring, Mark Rutland, Ryder Lee,
Sean Wang, Nicolas Boichat, Matthias Brugger
Cc: Dan Williams, Greg Kroah-Hartman, Jiri Slaby, Sean Wang,
dmaengine, devicetree, linux-arm-kernel, linux-mediatek,
linux-kernel, linux-serial, srv_heupstream, Yingjoe Chen, YT Shen,
Zhenbao Liu, Long Cheng
In-Reply-To: <1551923135-32479-1-git-send-email-long.cheng@mediatek.com>
The filename matches mtk-uart-apdma.c.
So using "mtk-uart-apdma.txt" should be better.
And add some property.
Signed-off-by: Long Cheng <long.cheng@mediatek.com>
Reviewed-by: Rob Herring <robh@kernel.org>
---
.../devicetree/bindings/dma/8250_mtk_dma.txt | 33 ------------
.../devicetree/bindings/dma/mtk-uart-apdma.txt | 55 ++++++++++++++++++++
2 files changed, 55 insertions(+), 33 deletions(-)
delete mode 100644 Documentation/devicetree/bindings/dma/8250_mtk_dma.txt
create mode 100644 Documentation/devicetree/bindings/dma/mtk-uart-apdma.txt
diff --git a/Documentation/devicetree/bindings/dma/8250_mtk_dma.txt b/Documentation/devicetree/bindings/dma/8250_mtk_dma.txt
deleted file mode 100644
index 3fe0961..0000000
--- a/Documentation/devicetree/bindings/dma/8250_mtk_dma.txt
+++ /dev/null
@@ -1,33 +0,0 @@
-* Mediatek UART APDMA Controller
-
-Required properties:
-- compatible should contain:
- * "mediatek,mt2712-uart-dma" for MT2712 compatible APDMA
- * "mediatek,mt6577-uart-dma" for MT6577 and all of the above
-
-- reg: The base address of the APDMA register bank.
-
-- interrupts: A single interrupt specifier.
-
-- clocks : Must contain an entry for each entry in clock-names.
- See ../clocks/clock-bindings.txt for details.
-- clock-names: The APDMA clock for register accesses
-
-Examples:
-
- apdma: dma-controller@11000380 {
- compatible = "mediatek,mt2712-uart-dma";
- reg = <0 0x11000380 0 0x400>;
- interrupts = <GIC_SPI 63 IRQ_TYPE_LEVEL_LOW>,
- <GIC_SPI 64 IRQ_TYPE_LEVEL_LOW>,
- <GIC_SPI 65 IRQ_TYPE_LEVEL_LOW>,
- <GIC_SPI 66 IRQ_TYPE_LEVEL_LOW>,
- <GIC_SPI 67 IRQ_TYPE_LEVEL_LOW>,
- <GIC_SPI 68 IRQ_TYPE_LEVEL_LOW>,
- <GIC_SPI 69 IRQ_TYPE_LEVEL_LOW>,
- <GIC_SPI 70 IRQ_TYPE_LEVEL_LOW>;
- clocks = <&pericfg CLK_PERI_AP_DMA>;
- clock-names = "apdma";
- #dma-cells = <1>;
- };
-
diff --git a/Documentation/devicetree/bindings/dma/mtk-uart-apdma.txt b/Documentation/devicetree/bindings/dma/mtk-uart-apdma.txt
new file mode 100644
index 0000000..e0424b3
--- /dev/null
+++ b/Documentation/devicetree/bindings/dma/mtk-uart-apdma.txt
@@ -0,0 +1,55 @@
+* Mediatek UART APDMA Controller
+
+Required properties:
+- compatible should contain:
+ * "mediatek,mt2712-uart-dma" for MT2712 compatible APDMA
+ * "mediatek,mt6577-uart-dma" for MT6577 and all of the above
+
+- reg: The base address of the APDMA register bank.
+
+- interrupts: A single interrupt specifier.
+ One interrupt per dma-requests, or 8 if no dma-requests property is present
+
+- dma-requests: The number of DMA channels
+
+- clocks : Must contain an entry for each entry in clock-names.
+ See ../clocks/clock-bindings.txt for details.
+- clock-names: The APDMA clock for register accesses
+
+- mediatek,dma-33bits: Present if the DMA requires support
+
+Examples:
+
+ apdma: dma-controller@11000400 {
+ compatible = "mediatek,mt2712-uart-dma";
+ reg = <0 0x11000400 0 0x80>,
+ <0 0x11000480 0 0x80>,
+ <0 0x11000500 0 0x80>,
+ <0 0x11000580 0 0x80>,
+ <0 0x11000600 0 0x80>,
+ <0 0x11000680 0 0x80>,
+ <0 0x11000700 0 0x80>,
+ <0 0x11000780 0 0x80>,
+ <0 0x11000800 0 0x80>,
+ <0 0x11000880 0 0x80>,
+ <0 0x11000900 0 0x80>,
+ <0 0x11000980 0 0x80>;
+ interrupts = <GIC_SPI 103 IRQ_TYPE_LEVEL_LOW>,
+ <GIC_SPI 104 IRQ_TYPE_LEVEL_LOW>,
+ <GIC_SPI 105 IRQ_TYPE_LEVEL_LOW>,
+ <GIC_SPI 106 IRQ_TYPE_LEVEL_LOW>,
+ <GIC_SPI 107 IRQ_TYPE_LEVEL_LOW>,
+ <GIC_SPI 108 IRQ_TYPE_LEVEL_LOW>,
+ <GIC_SPI 109 IRQ_TYPE_LEVEL_LOW>,
+ <GIC_SPI 110 IRQ_TYPE_LEVEL_LOW>,
+ <GIC_SPI 111 IRQ_TYPE_LEVEL_LOW>,
+ <GIC_SPI 112 IRQ_TYPE_LEVEL_LOW>,
+ <GIC_SPI 113 IRQ_TYPE_LEVEL_LOW>,
+ <GIC_SPI 114 IRQ_TYPE_LEVEL_LOW>;
+ dma-requests = <12>;
+ clocks = <&pericfg CLK_PERI_AP_DMA>;
+ clock-names = "apdma";
+ mediatek,dma-33bits;
+ #dma-cells = <1>;
+ };
+
--
1.7.9.5
^ permalink raw reply related
* [PATCH v11 2/4] arm: dts: mt2712: add uart APDMA to device tree
From: Long Cheng @ 2019-03-07 1:45 UTC (permalink / raw)
To: Vinod Koul, Randy Dunlap, Rob Herring, Mark Rutland, Ryder Lee,
Sean Wang, Nicolas Boichat, Matthias Brugger
Cc: Dan Williams, Greg Kroah-Hartman, Jiri Slaby, Sean Wang,
dmaengine, devicetree, linux-arm-kernel, linux-mediatek,
linux-kernel, linux-serial, srv_heupstream, Yingjoe Chen, YT Shen,
Zhenbao Liu, Long Cheng
In-Reply-To: <1551923135-32479-1-git-send-email-long.cheng@mediatek.com>
1. add uart APDMA controller device node
2. add uart 0/1/2/3/4/5 DMA function
Signed-off-by: Long Cheng <long.cheng@mediatek.com>
---
arch/arm64/boot/dts/mediatek/mt2712e.dtsi | 51 +++++++++++++++++++++++++++++
1 file changed, 51 insertions(+)
diff --git a/arch/arm64/boot/dts/mediatek/mt2712e.dtsi b/arch/arm64/boot/dts/mediatek/mt2712e.dtsi
index ee627a7..3469a6c 100644
--- a/arch/arm64/boot/dts/mediatek/mt2712e.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt2712e.dtsi
@@ -298,6 +298,9 @@
interrupts = <GIC_SPI 127 IRQ_TYPE_LEVEL_LOW>;
clocks = <&baud_clk>, <&sys_clk>;
clock-names = "baud", "bus";
+ dmas = <&apdma 10
+ &apdma 11>;
+ dma-names = "tx", "rx";
status = "disabled";
};
@@ -346,6 +349,39 @@
(GIC_CPU_MASK_RAW(0x13) | IRQ_TYPE_LEVEL_HIGH)>;
};
+ apdma: dma-controller@11000400 {
+ compatible = "mediatek,mt2712-uart-dma",
+ "mediatek,mt6577-uart-dma";
+ reg = <0 0x11000400 0 0x80>,
+ <0 0x11000480 0 0x80>,
+ <0 0x11000500 0 0x80>,
+ <0 0x11000580 0 0x80>,
+ <0 0x11000600 0 0x80>,
+ <0 0x11000680 0 0x80>,
+ <0 0x11000700 0 0x80>,
+ <0 0x11000780 0 0x80>,
+ <0 0x11000800 0 0x80>,
+ <0 0x11000880 0 0x80>,
+ <0 0x11000900 0 0x80>,
+ <0 0x11000980 0 0x80>;
+ interrupts = <GIC_SPI 103 IRQ_TYPE_LEVEL_LOW>,
+ <GIC_SPI 104 IRQ_TYPE_LEVEL_LOW>,
+ <GIC_SPI 105 IRQ_TYPE_LEVEL_LOW>,
+ <GIC_SPI 106 IRQ_TYPE_LEVEL_LOW>,
+ <GIC_SPI 107 IRQ_TYPE_LEVEL_LOW>,
+ <GIC_SPI 108 IRQ_TYPE_LEVEL_LOW>,
+ <GIC_SPI 109 IRQ_TYPE_LEVEL_LOW>,
+ <GIC_SPI 110 IRQ_TYPE_LEVEL_LOW>,
+ <GIC_SPI 111 IRQ_TYPE_LEVEL_LOW>,
+ <GIC_SPI 112 IRQ_TYPE_LEVEL_LOW>,
+ <GIC_SPI 113 IRQ_TYPE_LEVEL_LOW>,
+ <GIC_SPI 114 IRQ_TYPE_LEVEL_LOW>;
+ dma-requests = <12>;
+ clocks = <&pericfg CLK_PERI_AP_DMA>;
+ clock-names = "apdma";
+ #dma-cells = <1>;
+ };
+
auxadc: adc@11001000 {
compatible = "mediatek,mt2712-auxadc";
reg = <0 0x11001000 0 0x1000>;
@@ -362,6 +398,9 @@
interrupts = <GIC_SPI 91 IRQ_TYPE_LEVEL_LOW>;
clocks = <&baud_clk>, <&sys_clk>;
clock-names = "baud", "bus";
+ dmas = <&apdma 0
+ &apdma 1>;
+ dma-names = "tx", "rx";
status = "disabled";
};
@@ -372,6 +411,9 @@
interrupts = <GIC_SPI 92 IRQ_TYPE_LEVEL_LOW>;
clocks = <&baud_clk>, <&sys_clk>;
clock-names = "baud", "bus";
+ dmas = <&apdma 2
+ &apdma 3>;
+ dma-names = "tx", "rx";
status = "disabled";
};
@@ -382,6 +424,9 @@
interrupts = <GIC_SPI 93 IRQ_TYPE_LEVEL_LOW>;
clocks = <&baud_clk>, <&sys_clk>;
clock-names = "baud", "bus";
+ dmas = <&apdma 4
+ &apdma 5>;
+ dma-names = "tx", "rx";
status = "disabled";
};
@@ -392,6 +437,9 @@
interrupts = <GIC_SPI 94 IRQ_TYPE_LEVEL_LOW>;
clocks = <&baud_clk>, <&sys_clk>;
clock-names = "baud", "bus";
+ dmas = <&apdma 6
+ &apdma 7>;
+ dma-names = "tx", "rx";
status = "disabled";
};
@@ -402,6 +450,9 @@
interrupts = <GIC_SPI 126 IRQ_TYPE_LEVEL_LOW>;
clocks = <&baud_clk>, <&sys_clk>;
clock-names = "baud", "bus";
+ dmas = <&apdma 8
+ &apdma 9>;
+ dma-names = "tx", "rx";
status = "disabled";
};
--
1.7.9.5
^ permalink raw reply related
* [PATCH v11 1/4] dmaengine: 8250_mtk_dma: add MediaTek uart DMA support
From: Long Cheng @ 2019-03-07 1:45 UTC (permalink / raw)
To: Vinod Koul, Randy Dunlap, Rob Herring, Mark Rutland, Ryder Lee,
Sean Wang, Nicolas Boichat, Matthias Brugger
Cc: Dan Williams, Greg Kroah-Hartman, Jiri Slaby, Sean Wang,
dmaengine, devicetree, linux-arm-kernel, linux-mediatek,
linux-kernel, linux-serial, srv_heupstream, Yingjoe Chen, YT Shen,
Zhenbao Liu, Long Cheng
In-Reply-To: <1551923135-32479-1-git-send-email-long.cheng@mediatek.com>
In DMA engine framework, add 8250 uart dma to support MediaTek uart.
If MediaTek uart enabled(SERIAL_8250_MT6577), and want to improve
the performance, can enable the function.
Signed-off-by: Long Cheng <long.cheng@mediatek.com>
---
drivers/dma/mediatek/Kconfig | 11 +
drivers/dma/mediatek/Makefile | 1 +
drivers/dma/mediatek/mtk-uart-apdma.c | 660 +++++++++++++++++++++++++++++++++
3 files changed, 672 insertions(+)
create mode 100644 drivers/dma/mediatek/mtk-uart-apdma.c
diff --git a/drivers/dma/mediatek/Kconfig b/drivers/dma/mediatek/Kconfig
index 680fc05..ac49eb6 100644
--- a/drivers/dma/mediatek/Kconfig
+++ b/drivers/dma/mediatek/Kconfig
@@ -24,3 +24,14 @@ config MTK_CQDMA
This controller provides the channels which is dedicated to
memory-to-memory transfer to offload from CPU.
+
+config MTK_UART_APDMA
+ tristate "MediaTek SoCs APDMA support for UART"
+ depends on OF && SERIAL_8250_MT6577
+ select DMA_ENGINE
+ select DMA_VIRTUAL_CHANNELS
+ help
+ Support for the UART DMA engine found on MediaTek MTK SoCs.
+ When SERIAL_8250_MT6577 is enabled, and if you want to use DMA,
+ you can enable the config. The DMA engine can only be used
+ with MediaTek SoCs.
diff --git a/drivers/dma/mediatek/Makefile b/drivers/dma/mediatek/Makefile
index 41bb381..61a6d29 100644
--- a/drivers/dma/mediatek/Makefile
+++ b/drivers/dma/mediatek/Makefile
@@ -1,2 +1,3 @@
+obj-$(CONFIG_MTK_UART_APDMA) += mtk-uart-apdma.o
obj-$(CONFIG_MTK_HSDMA) += mtk-hsdma.o
obj-$(CONFIG_MTK_CQDMA) += mtk-cqdma.o
diff --git a/drivers/dma/mediatek/mtk-uart-apdma.c b/drivers/dma/mediatek/mtk-uart-apdma.c
new file mode 100644
index 0000000..9ed7a49
--- /dev/null
+++ b/drivers/dma/mediatek/mtk-uart-apdma.c
@@ -0,0 +1,660 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * MediaTek Uart APDMA driver.
+ *
+ * Copyright (c) 2018 MediaTek Inc.
+ * Author: Long Cheng <long.cheng@mediatek.com>
+ */
+
+#include <linux/clk.h>
+#include <linux/dmaengine.h>
+#include <linux/dma-mapping.h>
+#include <linux/err.h>
+#include <linux/init.h>
+#include <linux/interrupt.h>
+#include <linux/iopoll.h>
+#include <linux/kernel.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/of_device.h>
+#include <linux/of_dma.h>
+#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+
+#include "../virt-dma.h"
+
+/* The default number of virtual channel */
+#define MTK_UART_APDMA_NR_VCHANS 8
+
+#define VFF_EN_B BIT(0)
+#define VFF_STOP_B BIT(0)
+#define VFF_FLUSH_B BIT(0)
+#define VFF_4G_SUPPORT_B BIT(0)
+#define VFF_RX_INT_EN0_B BIT(0) /* rx valid size >= vff thre */
+#define VFF_RX_INT_EN1_B BIT(1)
+#define VFF_TX_INT_EN_B BIT(0) /* tx left size >= vff thre */
+#define VFF_WARM_RST_B BIT(0)
+#define VFF_RX_INT_CLR_B (BIT(0) | BIT(1))
+#define VFF_TX_INT_CLR_B 0
+#define VFF_STOP_CLR_B 0
+#define VFF_INT_EN_CLR_B 0
+#define VFF_4G_SUPPORT_CLR_B 0
+
+/* interrupt trigger level for tx */
+#define VFF_TX_THRE(n) ((n) * 7 / 8)
+/* interrupt trigger level for rx */
+#define VFF_RX_THRE(n) ((n) * 3 / 4)
+
+#define VFF_RING_SIZE 0xffffU
+/* invert this bit when wrap ring head again */
+#define VFF_RING_WRAP 0x10000U
+
+#define VFF_INT_FLAG 0x00
+#define VFF_INT_EN 0x04
+#define VFF_EN 0x08
+#define VFF_RST 0x0c
+#define VFF_STOP 0x10
+#define VFF_FLUSH 0x14
+#define VFF_ADDR 0x1c
+#define VFF_LEN 0x24
+#define VFF_THRE 0x28
+#define VFF_WPT 0x2c
+#define VFF_RPT 0x30
+/* TX: the buffer size HW can read. RX: the buffer size SW can read. */
+#define VFF_VALID_SIZE 0x3c
+/* TX: the buffer size SW can write. RX: the buffer size HW can write. */
+#define VFF_LEFT_SIZE 0x40
+#define VFF_DEBUG_STATUS 0x50
+#define VFF_4G_SUPPORT 0x54
+
+struct mtk_uart_apdmadev {
+ struct dma_device ddev;
+ struct clk *clk;
+ bool support_33bits;
+ unsigned int dma_requests;
+ unsigned int *dma_irq;
+};
+
+struct mtk_uart_apdma_desc {
+ struct virt_dma_desc vd;
+
+ unsigned int avail_len;
+};
+
+struct mtk_chan {
+ struct virt_dma_chan vc;
+ struct dma_slave_config cfg;
+ void __iomem *base;
+ struct mtk_uart_apdma_desc *desc;
+
+ enum dma_transfer_direction dir;
+
+ bool requested;
+
+ unsigned int rx_status;
+};
+
+static inline struct mtk_uart_apdmadev *
+to_mtk_uart_apdma_dev(struct dma_device *d)
+{
+ return container_of(d, struct mtk_uart_apdmadev, ddev);
+}
+
+static inline struct mtk_chan *to_mtk_uart_apdma_chan(struct dma_chan *c)
+{
+ return container_of(c, struct mtk_chan, vc.chan);
+}
+
+static inline struct mtk_uart_apdma_desc *to_mtk_uart_apdma_desc
+ (struct dma_async_tx_descriptor *t)
+{
+ return container_of(t, struct mtk_uart_apdma_desc, vd.tx);
+}
+
+static void mtk_uart_apdma_write(struct mtk_chan *c,
+ unsigned int reg, unsigned int val)
+{
+ writel(val, c->base + reg);
+}
+
+static unsigned int mtk_uart_apdma_read(struct mtk_chan *c, unsigned int reg)
+{
+ return readl(c->base + reg);
+}
+
+static void mtk_uart_apdma_desc_free(struct virt_dma_desc *vd)
+{
+ struct dma_chan *chan = vd->tx.chan;
+ struct mtk_chan *c = to_mtk_uart_apdma_chan(chan);
+
+ kfree(c->desc);
+}
+
+static void mtk_uart_apdma_start_tx(struct mtk_chan *c)
+{
+ unsigned int len, send, left, wpt, d_wpt, tmp;
+ int ret;
+
+ left = mtk_uart_apdma_read(c, VFF_LEFT_SIZE);
+ if (!left) {
+ mtk_uart_apdma_write(c, VFF_INT_EN, VFF_TX_INT_EN_B);
+ return;
+ }
+
+ /* Wait 1sec for flush, can't sleep */
+ ret = readx_poll_timeout(readl, c->base + VFF_FLUSH, tmp,
+ tmp != VFF_FLUSH_B, 0, 1000000);
+ if (ret)
+ dev_warn(c->vc.chan.device->dev, "tx: fail, debug=0x%x\n",
+ mtk_uart_apdma_read(c, VFF_DEBUG_STATUS));
+
+ send = min_t(unsigned int, left, c->desc->avail_len);
+ wpt = mtk_uart_apdma_read(c, VFF_WPT);
+ len = c->cfg.dst_port_window_size;
+
+ d_wpt = wpt + send;
+ if ((d_wpt & VFF_RING_SIZE) >= len) {
+ d_wpt = d_wpt - len;
+ d_wpt = d_wpt ^ VFF_RING_WRAP;
+ }
+ mtk_uart_apdma_write(c, VFF_WPT, d_wpt);
+
+ c->desc->avail_len -= send;
+
+ mtk_uart_apdma_write(c, VFF_INT_EN, VFF_TX_INT_EN_B);
+ if (mtk_uart_apdma_read(c, VFF_FLUSH) == 0U)
+ mtk_uart_apdma_write(c, VFF_FLUSH, VFF_FLUSH_B);
+}
+
+static void mtk_uart_apdma_start_rx(struct mtk_chan *c)
+{
+ struct mtk_uart_apdma_desc *d = c->desc;
+ unsigned int len, wg, rg;
+ int cnt;
+
+ if ((mtk_uart_apdma_read(c, VFF_VALID_SIZE) == 0U) ||
+ !d || !vchan_next_desc(&c->vc))
+ return;
+
+ len = c->cfg.src_port_window_size;
+ rg = mtk_uart_apdma_read(c, VFF_RPT);
+ wg = mtk_uart_apdma_read(c, VFF_WPT);
+ cnt = (wg & VFF_RING_SIZE) - (rg & VFF_RING_SIZE);
+ /*
+ * The buffer is ring buffer. If wrap bit different,
+ * represents the start of the next cycle for WPT
+ */
+ if ((rg ^ wg) & VFF_RING_WRAP)
+ cnt += len;
+
+ c->rx_status = d->avail_len - cnt;
+ mtk_uart_apdma_write(c, VFF_RPT, wg);
+
+ list_del(&d->vd.node);
+ vchan_cookie_complete(&d->vd);
+}
+
+static irqreturn_t mtk_uart_apdma_irq_handler(int irq, void *dev_id)
+{
+ struct dma_chan *chan = (struct dma_chan *)dev_id;
+ struct mtk_chan *c = to_mtk_uart_apdma_chan(chan);
+ struct mtk_uart_apdma_desc *d;
+ unsigned long flags;
+
+ spin_lock_irqsave(&c->vc.lock, flags);
+ if (c->dir == DMA_DEV_TO_MEM) {
+ mtk_uart_apdma_write(c, VFF_INT_FLAG, VFF_RX_INT_CLR_B);
+ mtk_uart_apdma_start_rx(c);
+ } else if (c->dir == DMA_MEM_TO_DEV) {
+ d = c->desc;
+
+ mtk_uart_apdma_write(c, VFF_INT_FLAG, VFF_TX_INT_CLR_B);
+
+ if (d->avail_len != 0U) {
+ mtk_uart_apdma_start_tx(c);
+ } else {
+ list_del(&d->vd.node);
+ vchan_cookie_complete(&d->vd);
+ }
+ }
+ spin_unlock_irqrestore(&c->vc.lock, flags);
+
+ return IRQ_HANDLED;
+}
+
+static int mtk_uart_apdma_alloc_chan_resources(struct dma_chan *chan)
+{
+ struct mtk_uart_apdmadev *mtkd = to_mtk_uart_apdma_dev(chan->device);
+ struct mtk_chan *c = to_mtk_uart_apdma_chan(chan);
+ unsigned int tmp;
+ int ret;
+
+ pm_runtime_get_sync(mtkd->ddev.dev);
+
+ mtk_uart_apdma_write(c, VFF_ADDR, 0);
+ mtk_uart_apdma_write(c, VFF_THRE, 0);
+ mtk_uart_apdma_write(c, VFF_LEN, 0);
+ mtk_uart_apdma_write(c, VFF_RST, VFF_WARM_RST_B);
+
+ ret = readx_poll_timeout(readl, c->base + VFF_EN, tmp, !tmp, 10, 100);
+ if (ret) {
+ dev_err(chan->device->dev, "dma reset: fail, timeout\n");
+ return ret;
+ }
+
+ if (!c->requested) {
+ c->requested = true;
+ ret = request_irq(mtkd->dma_irq[chan->chan_id],
+ mtk_uart_apdma_irq_handler, IRQF_TRIGGER_NONE,
+ KBUILD_MODNAME, chan);
+ if (ret < 0) {
+ dev_err(chan->device->dev, "Can't request dma IRQ\n");
+ return -EINVAL;
+ }
+ }
+
+ if (mtkd->support_33bits)
+ mtk_uart_apdma_write(c, VFF_4G_SUPPORT, VFF_4G_SUPPORT_CLR_B);
+
+ return ret;
+}
+
+static void mtk_uart_apdma_free_chan_resources(struct dma_chan *chan)
+{
+ struct mtk_uart_apdmadev *mtkd = to_mtk_uart_apdma_dev(chan->device);
+ struct mtk_chan *c = to_mtk_uart_apdma_chan(chan);
+
+ if (c->requested) {
+ c->requested = false;
+ free_irq(mtkd->dma_irq[chan->chan_id], chan);
+ }
+
+ tasklet_kill(&c->vc.task);
+
+ vchan_free_chan_resources(&c->vc);
+
+ pm_runtime_put_sync(mtkd->ddev.dev);
+}
+
+static enum dma_status mtk_uart_apdma_tx_status(struct dma_chan *chan,
+ dma_cookie_t cookie,
+ struct dma_tx_state *txstate)
+{
+ struct mtk_chan *c = to_mtk_uart_apdma_chan(chan);
+ enum dma_status ret;
+
+ ret = dma_cookie_status(chan, cookie, txstate);
+
+ dma_set_residue(txstate, c->rx_status);
+
+ return ret;
+}
+
+static void mtk_uart_apdma_config_write(struct dma_chan *chan,
+ struct dma_slave_config *cfg,
+ enum dma_transfer_direction dir)
+{
+ struct mtk_chan *c = to_mtk_uart_apdma_chan(chan);
+ struct mtk_uart_apdmadev *mtkd =
+ to_mtk_uart_apdma_dev(c->vc.chan.device);
+ unsigned int tmp;
+
+ if (mtk_uart_apdma_read(c, VFF_EN) == VFF_EN_B)
+ return;
+
+ c->dir = dir;
+
+ if (dir == DMA_DEV_TO_MEM) {
+ tmp = cfg->src_port_window_size;
+
+ mtk_uart_apdma_write(c, VFF_ADDR, cfg->src_addr);
+ mtk_uart_apdma_write(c, VFF_LEN, tmp);
+ mtk_uart_apdma_write(c, VFF_THRE, VFF_RX_THRE(tmp));
+ mtk_uart_apdma_write(c, VFF_INT_EN,
+ VFF_RX_INT_EN0_B | VFF_RX_INT_EN1_B);
+ mtk_uart_apdma_write(c, VFF_RPT, 0);
+ mtk_uart_apdma_write(c, VFF_INT_FLAG, VFF_RX_INT_CLR_B);
+ } else if (dir == DMA_MEM_TO_DEV) {
+ tmp = cfg->dst_port_window_size;
+
+ mtk_uart_apdma_write(c, VFF_ADDR, cfg->dst_addr);
+ mtk_uart_apdma_write(c, VFF_LEN, tmp);
+ mtk_uart_apdma_write(c, VFF_THRE, VFF_TX_THRE(tmp));
+ mtk_uart_apdma_write(c, VFF_WPT, 0);
+ mtk_uart_apdma_write(c, VFF_INT_FLAG, VFF_TX_INT_CLR_B);
+ }
+
+ mtk_uart_apdma_write(c, VFF_EN, VFF_EN_B);
+
+ if (mtkd->support_33bits)
+ mtk_uart_apdma_write(c, VFF_4G_SUPPORT, VFF_4G_SUPPORT_B);
+
+ if (mtk_uart_apdma_read(c, VFF_EN) != VFF_EN_B)
+ dev_err(chan->device->dev, "dir[%d] fail\n", dir);
+}
+
+/*
+ * dmaengine_prep_slave_single will call the function. and sglen is 1.
+ * 8250 uart using one ring buffer, and deal with one sg.
+ */
+static struct dma_async_tx_descriptor *mtk_uart_apdma_prep_slave_sg
+ (struct dma_chan *chan, struct scatterlist *sgl,
+ unsigned int sglen, enum dma_transfer_direction dir,
+ unsigned long tx_flags, void *context)
+{
+ struct mtk_chan *c = to_mtk_uart_apdma_chan(chan);
+ struct mtk_uart_apdma_desc *d;
+
+ if (!is_slave_direction(dir))
+ return NULL;
+
+ mtk_uart_apdma_config_write(chan, &c->cfg, dir);
+
+ /* Now allocate and setup the descriptor */
+ d = kzalloc(sizeof(*d), GFP_ATOMIC);
+ if (!d)
+ return NULL;
+
+ /* sglen is 1 */
+ d->avail_len = sg_dma_len(sgl);
+ c->rx_status = d->avail_len;
+
+ return vchan_tx_prep(&c->vc, &d->vd, tx_flags);
+}
+
+static void mtk_uart_apdma_issue_pending(struct dma_chan *chan)
+{
+ struct mtk_chan *c = to_mtk_uart_apdma_chan(chan);
+ struct virt_dma_desc *vd;
+ unsigned long flags;
+
+ spin_lock_irqsave(&c->vc.lock, flags);
+ if (vchan_issue_pending(&c->vc)) {
+ vd = vchan_next_desc(&c->vc);
+ c->desc = to_mtk_uart_apdma_desc(&vd->tx);
+ }
+
+ if (c->dir == DMA_DEV_TO_MEM)
+ mtk_uart_apdma_start_rx(c);
+ else if (c->dir == DMA_MEM_TO_DEV)
+ mtk_uart_apdma_start_tx(c);
+
+ spin_unlock_irqrestore(&c->vc.lock, flags);
+}
+
+static int mtk_uart_apdma_slave_config(struct dma_chan *chan,
+ struct dma_slave_config *config)
+{
+ struct mtk_chan *c = to_mtk_uart_apdma_chan(chan);
+
+ memcpy(&c->cfg, config, sizeof(*config));
+
+ return 0;
+}
+
+static int mtk_uart_apdma_terminate_all(struct dma_chan *chan)
+{
+ struct mtk_chan *c = to_mtk_uart_apdma_chan(chan);
+ unsigned long flags;
+ unsigned int tmp;
+ int ret;
+
+ spin_lock_irqsave(&c->vc.lock, flags);
+
+ mtk_uart_apdma_write(c, VFF_FLUSH, VFF_FLUSH_B);
+ /* Wait 1sec for flush, can't sleep */
+ ret = readx_poll_timeout(readl, c->base + VFF_FLUSH, tmp,
+ tmp != VFF_FLUSH_B, 0, 1000000);
+ if (ret)
+ dev_err(c->vc.chan.device->dev, "flush: fail, debug=0x%x\n",
+ mtk_uart_apdma_read(c, VFF_DEBUG_STATUS));
+
+ /* set stop as 1 -> wait until en is 0 -> set stop as 0 */
+ mtk_uart_apdma_write(c, VFF_STOP, VFF_STOP_B);
+ ret = readx_poll_timeout(readl, c->base + VFF_EN, tmp, !tmp, 10, 100);
+ if (ret)
+ dev_err(c->vc.chan.device->dev, "stop: fail, debug=0x%x\n",
+ mtk_uart_apdma_read(c, VFF_DEBUG_STATUS));
+
+ mtk_uart_apdma_write(c, VFF_STOP, VFF_STOP_CLR_B);
+ mtk_uart_apdma_write(c, VFF_INT_EN, VFF_INT_EN_CLR_B);
+
+ if (c->dir == DMA_DEV_TO_MEM)
+ mtk_uart_apdma_write(c, VFF_INT_FLAG, VFF_RX_INT_CLR_B);
+ else if (c->dir == DMA_MEM_TO_DEV)
+ mtk_uart_apdma_write(c, VFF_INT_FLAG, VFF_TX_INT_CLR_B);
+
+ spin_unlock_irqrestore(&c->vc.lock, flags);
+
+ return 0;
+}
+
+static int mtk_uart_apdma_device_pause(struct dma_chan *chan)
+{
+ /* just for check caps pass */
+ dev_err(chan->device->dev, "Pause can't support\n");
+
+ return 0;
+}
+
+static void mtk_uart_apdma_free(struct mtk_uart_apdmadev *mtkd)
+{
+ while (!list_empty(&mtkd->ddev.channels)) {
+ struct mtk_chan *c = list_first_entry(&mtkd->ddev.channels,
+ struct mtk_chan, vc.chan.device_node);
+
+ list_del(&c->vc.chan.device_node);
+ tasklet_kill(&c->vc.task);
+ }
+}
+
+static const struct of_device_id mtk_uart_apdma_match[] = {
+ { .compatible = "mediatek,mt6577-uart-dma", },
+ { /* sentinel */ },
+};
+MODULE_DEVICE_TABLE(of, mtk_uart_apdma_match);
+
+static int mtk_uart_apdma_probe(struct platform_device *pdev)
+{
+ struct device_node *np = pdev->dev.of_node;
+ struct mtk_uart_apdmadev *mtkd;
+ struct resource *res;
+ struct mtk_chan *c;
+ int bit_mask = 32, rc;
+ unsigned int i;
+
+ mtkd = devm_kzalloc(&pdev->dev, sizeof(*mtkd), GFP_KERNEL);
+ if (!mtkd)
+ return -ENOMEM;
+
+ mtkd->clk = devm_clk_get(&pdev->dev, NULL);
+ if (IS_ERR(mtkd->clk)) {
+ dev_err(&pdev->dev, "No clock specified\n");
+ rc = PTR_ERR(mtkd->clk);
+ return rc;
+ }
+
+ if (of_property_read_bool(np, "mediatek,dma-33bits"))
+ mtkd->support_33bits = true;
+
+ if (mtkd->support_33bits)
+ bit_mask = 33;
+
+ rc = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(bit_mask));
+ if (rc)
+ return rc;
+
+ dma_cap_set(DMA_SLAVE, mtkd->ddev.cap_mask);
+ mtkd->ddev.device_alloc_chan_resources =
+ mtk_uart_apdma_alloc_chan_resources;
+ mtkd->ddev.device_free_chan_resources =
+ mtk_uart_apdma_free_chan_resources;
+ mtkd->ddev.device_tx_status = mtk_uart_apdma_tx_status;
+ mtkd->ddev.device_issue_pending = mtk_uart_apdma_issue_pending;
+ mtkd->ddev.device_prep_slave_sg = mtk_uart_apdma_prep_slave_sg;
+ mtkd->ddev.device_config = mtk_uart_apdma_slave_config;
+ mtkd->ddev.device_pause = mtk_uart_apdma_device_pause;
+ mtkd->ddev.device_terminate_all = mtk_uart_apdma_terminate_all;
+ mtkd->ddev.src_addr_widths = BIT(DMA_SLAVE_BUSWIDTH_1_BYTE);
+ mtkd->ddev.dst_addr_widths = BIT(DMA_SLAVE_BUSWIDTH_1_BYTE);
+ mtkd->ddev.directions = BIT(DMA_DEV_TO_MEM) | BIT(DMA_MEM_TO_DEV);
+ mtkd->ddev.residue_granularity = DMA_RESIDUE_GRANULARITY_SEGMENT;
+ mtkd->ddev.dev = &pdev->dev;
+ INIT_LIST_HEAD(&mtkd->ddev.channels);
+
+ mtkd->dma_requests = MTK_UART_APDMA_NR_VCHANS;
+ if (of_property_read_u32(np, "dma-requests", &mtkd->dma_requests)) {
+ dev_info(&pdev->dev,
+ "Using %u as missing dma-requests property\n",
+ MTK_UART_APDMA_NR_VCHANS);
+ }
+
+ mtkd->dma_irq = devm_kcalloc(&pdev->dev, mtkd->dma_requests,
+ sizeof(*mtkd->dma_irq), GFP_KERNEL);
+ if (!mtkd->dma_irq)
+ return -ENOMEM;
+
+ for (i = 0; i < mtkd->dma_requests; i++) {
+ c = devm_kzalloc(mtkd->ddev.dev, sizeof(*c), GFP_KERNEL);
+ if (!c) {
+ rc = -ENODEV;
+ goto err_no_dma;
+ }
+
+ res = platform_get_resource(pdev, IORESOURCE_MEM, i);
+ if (!res) {
+ rc = -ENODEV;
+ goto err_no_dma;
+ }
+
+ c->base = devm_ioremap_resource(&pdev->dev, res);
+ if (IS_ERR(c->base)) {
+ rc = PTR_ERR(c->base);
+ goto err_no_dma;
+ }
+ c->requested = false;
+ c->vc.desc_free = mtk_uart_apdma_desc_free;
+ vchan_init(&c->vc, &mtkd->ddev);
+
+ mtkd->dma_irq[i] = platform_get_irq(pdev, i);
+ if ((int)mtkd->dma_irq[i] < 0) {
+ dev_err(&pdev->dev, "failed to get IRQ[%d]\n", i);
+ rc = -EINVAL;
+ goto err_no_dma;
+ }
+ }
+
+ pm_runtime_enable(&pdev->dev);
+ pm_runtime_set_active(&pdev->dev);
+
+ rc = dma_async_device_register(&mtkd->ddev);
+ if (rc)
+ goto rpm_disable;
+
+ platform_set_drvdata(pdev, mtkd);
+
+ /* Device-tree DMA controller registration */
+ rc = of_dma_controller_register(np, of_dma_xlate_by_chan_id, mtkd);
+ if (rc)
+ goto dma_remove;
+
+ return rc;
+
+dma_remove:
+ dma_async_device_unregister(&mtkd->ddev);
+rpm_disable:
+ pm_runtime_disable(&pdev->dev);
+err_no_dma:
+ mtk_uart_apdma_free(mtkd);
+ return rc;
+}
+
+static int mtk_uart_apdma_remove(struct platform_device *pdev)
+{
+ struct mtk_uart_apdmadev *mtkd = platform_get_drvdata(pdev);
+
+ if (pdev->dev.of_node)
+ of_dma_controller_free(pdev->dev.of_node);
+
+ pm_runtime_disable(&pdev->dev);
+ pm_runtime_put_noidle(&pdev->dev);
+
+ dma_async_device_unregister(&mtkd->ddev);
+ mtk_uart_apdma_free(mtkd);
+
+ return 0;
+}
+
+#ifdef CONFIG_PM_SLEEP
+static int mtk_uart_apdma_suspend(struct device *dev)
+{
+ struct mtk_uart_apdmadev *mtkd = dev_get_drvdata(dev);
+
+ if (!pm_runtime_suspended(dev))
+ clk_disable_unprepare(mtkd->clk);
+
+ return 0;
+}
+
+static int mtk_uart_apdma_resume(struct device *dev)
+{
+ int ret;
+ struct mtk_uart_apdmadev *mtkd = dev_get_drvdata(dev);
+
+ if (!pm_runtime_suspended(dev)) {
+ ret = clk_prepare_enable(mtkd->clk);
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
+#endif /* CONFIG_PM_SLEEP */
+
+#ifdef CONFIG_PM
+static int mtk_uart_apdma_runtime_suspend(struct device *dev)
+{
+ struct mtk_uart_apdmadev *mtkd = dev_get_drvdata(dev);
+
+ clk_disable_unprepare(mtkd->clk);
+
+ return 0;
+}
+
+static int mtk_uart_apdma_runtime_resume(struct device *dev)
+{
+ int ret;
+ struct mtk_uart_apdmadev *mtkd = dev_get_drvdata(dev);
+
+ ret = clk_prepare_enable(mtkd->clk);
+ if (ret)
+ return ret;
+
+ return 0;
+}
+#endif /* CONFIG_PM */
+
+static const struct dev_pm_ops mtk_uart_apdma_pm_ops = {
+ SET_SYSTEM_SLEEP_PM_OPS(mtk_uart_apdma_suspend, mtk_uart_apdma_resume)
+ SET_RUNTIME_PM_OPS(mtk_uart_apdma_runtime_suspend,
+ mtk_uart_apdma_runtime_resume, NULL)
+};
+
+static struct platform_driver mtk_uart_apdma_driver = {
+ .probe = mtk_uart_apdma_probe,
+ .remove = mtk_uart_apdma_remove,
+ .driver = {
+ .name = KBUILD_MODNAME,
+ .pm = &mtk_uart_apdma_pm_ops,
+ .of_match_table = of_match_ptr(mtk_uart_apdma_match),
+ },
+};
+
+module_platform_driver(mtk_uart_apdma_driver);
+
+MODULE_DESCRIPTION("MediaTek UART APDMA Controller Driver");
+MODULE_AUTHOR("Long Cheng <long.cheng@mediatek.com>");
+MODULE_LICENSE("GPL v2");
+
--
1.7.9.5
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox