Embedded Linux development
 help / color / mirror / Atom feed
* Re: [PATCH RFC 1/5] scripts: Add sortextable to sort the kernel's exception table.
From: David Daney @ 2011-11-21 19:16 UTC (permalink / raw)
  To: Mike Frysinger
  Cc: linux-mips, ralf, linux-kernel, linux-arch, linux-embedded, x86,
	David Daney
In-Reply-To: <201111211350.58916.vapier@gentoo.org>

On 11/21/2011 10:50 AM, Mike Frysinger wrote:
> On Monday 21 November 2011 13:25:36 David Daney wrote:
>> On 11/20/2011 03:22 PM, Mike Frysinger wrote:
>>> On Friday 18 November 2011 14:37:44 David Daney wrote:
>>>> +	switch (w2(ehdr->e_machine)) {
>>>> +	default:
>>>> +		fprintf(stderr, "unrecognized e_machine %d %s\n",
>>>> +			w2(ehdr->e_machine), fname);
>>>> +		fail_file();
>>>> +		break;
>>>> +	case EM_386:
>>>> +	case EM_MIPS:
>>>> +	case EM_X86_64:
>>>> +		break;
>>>> +	}  /* end switch */
>>>
>>> unlike recordmcount, this file doesn't do anything arch specific.  so
>>> let's just delete this and be done.
>>
>> Not really true at this point.  We don't know the size or layout of the
>> architecture specific exception table entries, likewise for
>> CONFIG_ARCH_HAS_SORT_EXTABLE, we don't even know how to do the comparison.
>
> all of your code that i could see is based on "is it 32bit or is it 64bit".
> there is no code that says "if it's x86, we need to do XXX".

At this point there is no need.  MIPS, i386 and x86_64 all store the key 
in the first word of a two word structure.

If there were some architecture that didn't fit this model, we would 
have to create a special sorting function and select it (and perhaps 
other parameters as well) in that switch statement.


>
> when i look in the kernel, we have common code behind ARCH_HAS_SORT_EXTABLE.
> so you could easily do the same thing:
>
> scripts/sortextable.c:
> 	#ifdef ARCH_HAS_SORT_EXTABLE
> 		switch (w2(ehdr->e_machine)) {
> 		default:
> 			fprintf(stderr, "unrecognized e_machine %d %s\n",
> 				w2(ehdr->e_machine), fname);
> 			... return a unique exit code like 77 ...
> 			break;
> 		/* add arch sorting info here */
> 		}  /* end switch */
> 	#endif
>
> kernel/extable.c:
> 	#if defined(ARCH_HAS_SORT_EXTABLE)&&  !defined(ARCH_HAS_SORTED_EXTABLE)
> 	void __init sort_main_extable(void)
> 	{
> 		sort_extable(__start___ex_table, __stop___ex_table);
> 	}
> 	#endif
>


Yes, I am familiar with that code.  One thing to keep in mind is that 
the compiler has access to struct exception_table_entry, and can easily 
figure out both how big the structure is *and* where the insn field is 
within the structure.

This is not the case for the author of sortextable.  Except for MIPS, 
MIPS64, i386 and x86_64, I know neither the size of struct 
exception_table_entry, nor the offset of its insn field.

For those with knowledge of the inner working of other architectures, it 
may be as simple as a two line patch to add support, but it isn't 
something that I want to take responsibility for at this point

> this way all the people not doing unique stuff work out of the box.  only the
> people who are doing funky stuff need to extend things.

I didn't want to include something like this that I cannot test.  An 
unsorted (or improperly sorted) exception table is not necessarily 
something that will be noticeable by simply booting the kernel.  Your 
only indication may be a panic or OOPS under rarely encountered conditions.

David Daney

^ permalink raw reply

* Re: [PATCH RFC 1/5] scripts: Add sortextable to sort the kernel's exception table.
From: Mike Frysinger @ 2011-11-21 20:08 UTC (permalink / raw)
  To: David Daney
  Cc: linux-mips, ralf, linux-kernel, linux-arch, linux-embedded, x86,
	David Daney
In-Reply-To: <4ECAA374.2040102@gmail.com>

[-- Attachment #1: Type: Text/Plain, Size: 3496 bytes --]

On Monday 21 November 2011 14:16:04 David Daney wrote:
> On 11/21/2011 10:50 AM, Mike Frysinger wrote:
> > On Monday 21 November 2011 13:25:36 David Daney wrote:
> >> On 11/20/2011 03:22 PM, Mike Frysinger wrote:
> >>> On Friday 18 November 2011 14:37:44 David Daney wrote:
> >>>> +	switch (w2(ehdr->e_machine)) {
> >>>> +	default:
> >>>> +		fprintf(stderr, "unrecognized e_machine %d %s\n",
> >>>> +			w2(ehdr->e_machine), fname);
> >>>> +		fail_file();
> >>>> +		break;
> >>>> +	case EM_386:
> >>>> +	case EM_MIPS:
> >>>> +	case EM_X86_64:
> >>>> +		break;
> >>>> +	}  /* end switch */
> >>> 
> >>> unlike recordmcount, this file doesn't do anything arch specific.  so
> >>> let's just delete this and be done.
> >> 
> >> Not really true at this point.  We don't know the size or layout of the
> >> architecture specific exception table entries, likewise for
> >> CONFIG_ARCH_HAS_SORT_EXTABLE, we don't even know how to do the
> >> comparison.
> > 
> > all of your code that i could see is based on "is it 32bit or is it
> > 64bit". there is no code that says "if it's x86, we need to do XXX".
> 
> At this point there is no need.  MIPS, i386 and x86_64 all store the key
> in the first word of a two word structure.
> 
> If there were some architecture that didn't fit this model, we would
> have to create a special sorting function and select it (and perhaps
> other parameters as well) in that switch statement.

that's trivial to check:
	sed -n '/^struct exception_table_entry/,/};/p'\
		arch/*/include/asm/uaccess* include/asm-generic/uaccess.h 

and indeed, the only arches that don't follow this model are the ones that 
define ARCH_HAS_SORT_EXTABLE.

> > when i look in the kernel, we have common code behind
> > ARCH_HAS_SORT_EXTABLE. so you could easily do the same thing:
> > 
> > scripts/sortextable.c:
> > 	#ifdef ARCH_HAS_SORT_EXTABLE
> > 	
> > 		switch (w2(ehdr->e_machine)) {
> > 		
> > 		default:
> > 			fprintf(stderr, "unrecognized e_machine %d %s\n",
> > 			
> > 				w2(ehdr->e_machine), fname);
> > 			
> > 			... return a unique exit code like 77 ...
> > 			break;
> > 		
> > 		/* add arch sorting info here */
> > 		}  /* end switch */
> > 	
> > 	#endif
> > 
> > kernel/extable.c:
> > 	#if defined(ARCH_HAS_SORT_EXTABLE)&&  !defined(ARCH_HAS_SORTED_EXTABLE)
> > 	void __init sort_main_extable(void)
> > 	{
> > 	
> > 		sort_extable(__start___ex_table, __stop___ex_table);
> > 	
> > 	}
> > 	#endif
> 
> Yes, I am familiar with that code.  One thing to keep in mind is that
> the compiler has access to struct exception_table_entry, and can easily
> figure out both how big the structure is *and* where the insn field is
> within the structure.
> 
> This is not the case for the author of sortextable.  Except for MIPS,
> MIPS64, i386 and x86_64, I know neither the size of struct
> exception_table_entry, nor the offset of its insn field.

a trivial sed/grep gets you the answer: they're all the same

> > this way all the people not doing unique stuff work out of the box.  only
> > the people who are doing funky stuff need to extend things.
> 
> I didn't want to include something like this that I cannot test.  An
> unsorted (or improperly sorted) exception table is not necessarily
> something that will be noticeable by simply booting the kernel.  Your
> only indication may be a panic or OOPS under rarely encountered conditions.

this is what linux-next is for :)
-mike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH RFC 3/5] kbuild/extable: Hook up sortextable into the build system.
From: David Daney @ 2011-11-22 21:38 UTC (permalink / raw)
  To: Michal Marek
  Cc: David Daney, linux-mips@linux-mips.org, ralf@linux-mips.org,
	linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-embedded@vger.kernel.org, x86@kernel.org, David Daney
In-Reply-To: <4EC9046C.3050708@suse.cz>

On 11/20/2011 05:45 AM, Michal Marek wrote:
> On 18.11.2011 20:37, David Daney wrote:
>> +	$(if $(CONFIG_BUILDTIME_EXTABLE_SORT),				\
>> +	$(Q)$(if $($(quiet)cmd_sortextable),				\
>> +	  echo '  $($(quiet)cmd_sortextable)  vmlinux'&&)		\
>> +	  $(cmd_sortextable)  vmlinux)
>
> Why do you opencode $(call cmd,sortextable) here?
>

Mostly just mimicking the other code in the vicinity.

I will try to improve it in the next version of the patch.

Thanks,
David Daney

^ permalink raw reply

* Help needed with porting 32-bit user ioctl to run on a 64-bit kernel
From: Linux Rules @ 2011-11-30 19:42 UTC (permalink / raw)
  To: linux-embedded

All,

I am working on porting 32-bit application code to run over a 64-bit kernel.

The 32 bit code calls socket ioctl to programs some registers in the
driver (e1000) and then opens a raw socket.

The struct ifreq is used to construct the command as expected.

The problem i am facing is that, the data which is copied back into
the ioctl buffer is not visible in the user space at all.

[32-bit user space if req with command] ==> [IOCTL] ==> [64-bit
KERNEL] ==> [reads register copies result into ifreq->ifr_data in
kernel] ==> [The copied data is not visible in the user code].

Since the data i require is having a length of only 4 bytes,  i used
the ifr_data field itself as a buffer instead of using it as a pointer
to a buffer because i see that the ifreq structure is of 32 bytes
length and ifr_name is part of a union which has a size of 16 bytes.

I tried to do some debugging on this issue and found these:

the struct ifreq is 32 byte in 32-bit user space but is 40 bytes in
kernel space, this is probably because of the size of sockaddr inside
ifreq which is different for 64-bit, also copy_from/to_user is not
required because do_ioctl is already working on a copy of the ifreq
buffer.

I did some googling and found suggestions to use compat_ioctl to do
the porting, but that is pertaining to using the file_operations
structure, netdevice does not have a compat_ioctl, does it?

Any help is appreciated in this regard.

kind regards,
mz.

^ permalink raw reply

* Re: Help needed with porting 32-bit user ioctl to run on a 64-bit kernel
From: Linux Rules @ 2011-11-30 22:11 UTC (permalink / raw)
  To: linux-embedded
In-Reply-To: <CA+ib0qH_8Ex10sjYH7abbFgaQaRRwKq73exn3jehjpts+Hx=aQ@mail.gmail.com>

I did some more digging and found that there is a file compat_ioctl.c
which has this function (below) which is creating a local ifreq buffer
copy of the incoming 32-bit buffer and passing to ioctl in kernel, but
does not copy_to_user the data back for default switch case, could
this be the problem?
fs/compat_ioctl.cstatic int dev_ifsioc(unsigned int fd, unsigned int
cmd, unsigned long arg) 504{ 505        struct ifreq ifr; 506
struct ifreq32 __user *uifr32; 507        struct ifmap32 __user
*uifmap32; 508        mm_segment_t old_fs; 509        int err; 510
    511        uifr32 = compat_ptr(arg); 512        uifmap32 =
&uifr32->ifr_ifru.ifru_map; 513        switch (cmd) { 514        case
SIOCSIFMAP: 515                err = copy_from_user(&ifr, uifr32,
sizeof(ifr.ifr_name)); 516                err |=
__get_user(ifr.ifr_map.mem_start, &uifmap32->mem_start); 517
     err |= __get_user(ifr.ifr_map.mem_end, &uifmap32->mem_end); 518
             err |= __get_user(ifr.ifr_map.base_addr,
&uifmap32->base_addr); 519                err |=
__get_user(ifr.ifr_map.irq, &uifmap32->irq); 520                err |=
__get_user(ifr.ifr_map.dma, &uifmap32->dma); 521                err |=
__get_user(ifr.ifr_map.port, &uifmap32->port); 522                if
(err) 523                        return -EFAULT; 524
break; 525        default: 526                if (copy_from_user(&ifr,
uifr32, sizeof(*uifr32))) 527                        return -EFAULT;
528                break; 529        } 530        old_fs = get_fs();
531        set_fs (KERNEL_DS); 532        err = sys_ioctl (fd, cmd,
(unsigned long)&ifr); 533        set_fs (old_fs); 534        if (!err)
{ 535                switch (cmd) { 536                /* TUNSETIFF is
defined as _IOW, it should be _IORW 537                 * as the data
is copied back to user space, but that 538                 * cannot be
fixed without breaking all existing apps. 539                 */ 540
             case TUNSETIFF: 541                case SIOCGIFFLAGS: 542
               case SIOCGIFMETRIC: 543                case SIOCGIFMTU:
544                case SIOCGIFMEM: 545                case
SIOCGIFHWADDR: 546                case SIOCGIFINDEX: 547
 case SIOCGIFADDR: 548                case SIOCGIFBRDADDR: 549
       case SIOCGIFDSTADDR: 550                case SIOCGIFNETMASK:
551                case SIOCGIFTXQLEN: 552                        if
(copy_to_user(uifr32, &ifr, sizeof(*uifr32))) 553
          return -EFAULT; 554                        break; 555
        case SIOCGIFMAP: 556                        err =
copy_to_user(uifr32, &ifr, sizeof(ifr.ifr_name)); 557
      err |= __put_user(ifr.ifr_map.mem_start, &uifmap32->mem_start);
558                        err |= __put_user(ifr.ifr_map.mem_end,
&uifmap32->mem_end); 559                        err |=
__put_user(ifr.ifr_map.base_addr, &uifmap32->base_addr); 560
             err |= __put_user(ifr.ifr_map.irq, &uifmap32->irq); 561
                     err |= __put_user(ifr.ifr_map.dma,
&uifmap32->dma); 562                        err |=
__put_user(ifr.ifr_map.port, &uifmap32->port); 563
   if (err) 564                                err = -EFAULT; 565
                  break; 566                } 567        } 568
return err; 569} 570
On Wed, Nov 30, 2011 at 11:42 AM, Linux Rules <linuzoo@gmail.com> wrote:
> All,
>
> I am working on porting 32-bit application code to run over a 64-bit kernel.
>
> The 32 bit code calls socket ioctl to programs some registers in the
> driver (e1000) and then opens a raw socket.
>
> The struct ifreq is used to construct the command as expected.
>
> The problem i am facing is that, the data which is copied back into
> the ioctl buffer is not visible in the user space at all.
>
> [32-bit user space if req with command] ==> [IOCTL] ==> [64-bit
> KERNEL] ==> [reads register copies result into ifreq->ifr_data in
> kernel] ==> [The copied data is not visible in the user code].
>
> Since the data i require is having a length of only 4 bytes,  i used
> the ifr_data field itself as a buffer instead of using it as a pointer
> to a buffer because i see that the ifreq structure is of 32 bytes
> length and ifr_name is part of a union which has a size of 16 bytes.
>
> I tried to do some debugging on this issue and found these:
>
> the struct ifreq is 32 byte in 32-bit user space but is 40 bytes in
> kernel space, this is probably because of the size of sockaddr inside
> ifreq which is different for 64-bit, also copy_from/to_user is not
> required because do_ioctl is already working on a copy of the ifreq
> buffer.
>
> I did some googling and found suggestions to use compat_ioctl to do
> the porting, but that is pertaining to using the file_operations
> structure, netdevice does not have a compat_ioctl, does it?
>
> Any help is appreciated in this regard.
>
> kind regards,
> mz.

^ permalink raw reply

* Fosdem embedded and mobile devroom call for papers
From: Geert Uytterhoeven @ 2011-12-16  6:58 UTC (permalink / raw)
  To: Linux Embedded

Fosdem will be held the 4th and 5th of February 2012 in Brussels,
Belgium. As usual and for the 9th time there will be an embedded and
mobile room.

For this years program we are looking for people who would like to do
a presentation about their or their community's projects in this area.
These projects must be Free Software or Open Source.

For example involvement and experiences with projects like Arduino,
Meego/Tizen, Mer, Beagleboard, Openembedded, Android, OpenWrt, Yocto, Linaro...
Also we are interested in short tutorials, project overviews,
achievements, ports to new hardware and hardware hacking, real life deployments,
...all are welcome and all submissions will be reviewed by our panel.

Submissions require a small abstract and short speaker presentation
and should be submitted to fosdem.embedded@gmail.com.before the 15th of
january of 2012

The panel consists of:

Philippe De Swert
Peter De Schrijver
Klaas Van Gend
Dave Neary
Michael Opdenacker
Geert Uytterhoeven

Looking forward to see you all at Fosdem (http://www.fosdem.org/)!

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Android mainlining project
From: Tim Bird @ 2011-12-20  0:39 UTC (permalink / raw)
  To: linux-embedded; +Cc: linux kernel

Hello all,

I would like to announce the beginning of a project to
make a concerted effort to mainline patches and features
from Android into the mainline Linux kernel.

At the recent kernel summit in Prague, there was a discussion
about re-evaluating some of the Android-specific features
that are found in Google's android kernel, and looking at
whether and how to incorporate them into the mainline (kernel.org)
kernel.

Various parties, including individual developers, the
CE workgroup in the Linux Foundation, and a group at Linaro
have been working on different aspects of this.  A wiki
page and mailing list have been created to help coordinate
our efforts and avoid duplication and confusion.

The wiki page for this effort (still under construction) is
at: http://elinux.org/Android_Mainlining_Project

There's a mailing list if you wish to follow discussions
about this project.  It's available at:
https://lists.linuxfoundation.org/mailman/listinfo/ce-android-mainline

If you are interested in this work, please join the mailing
list or send me an e-mail.  Some interesting things are already
happening.  I'll send a status message to the ce-android-mainline
list in a day or two to let interested parties know what's currently
going on.

Regards,
 -- Tim

=============================
Tim Bird
Architecture Group Chair, CE Workgroup of the Linux Foundation
Senior Staff Engineer, Sony Network Entertainment
=============================

^ permalink raw reply

* Re: Android mainlining project
From: Geunsik Lim @ 2011-12-20 23:27 UTC (permalink / raw)
  To: Tim Bird; +Cc: linux-embedded, linux kernel
In-Reply-To: <4EEFD926.3040201@am.sony.com>

On Tue, Dec 20, 2011 at 9:39 AM, Tim Bird <tim.bird@am.sony.com> wrote:
> Hello all,
>
> I would like to announce the beginning of a project to
> make a concerted effort to mainline patches and features
> from Android into the mainline Linux kernel.
>
> At the recent kernel summit in Prague, there was a discussion
> about re-evaluating some of the Android-specific features
> that are found in Google's android kernel, and looking at
> whether and how to incorporate them into the mainline (kernel.org)
> kernel.
>
> Various parties, including individual developers, the
> CE workgroup in the Linux Foundation, and a group at Linaro
> have been working on different aspects of this.  A wiki
> page and mailing list have been created to help coordinate
> our efforts and avoid duplication and confusion.
>
> The wiki page for this effort (still under construction) is
> at: http://elinux.org/Android_Mainlining_Project
Actually, current linux-next tree is not including ashmem(anonymous
shared memory).
When I checked android kernel features of linux-next(20111219),
Wakelock API feature(./kernel/power/*lock.c) still isn't appeared in
linux-next tree.

>
> There's a mailing list if you wish to follow discussions
> about this project.  It's available at:
> https://lists.linuxfoundation.org/mailman/listinfo/ce-android-mainline
>
> If you are interested in this work, please join the mailing
> list or send me an e-mail.  Some interesting things are already
> happening.  I'll send a status message to the ce-android-mainline
> list in a day or two to let interested parties know what's currently
> going on.
>
> Regards,
>  -- Tim
>
> =============================
> Tim Bird
> Architecture Group Chair, CE Workgroup of the Linux Foundation
> Senior Staff Engineer, Sony Network Entertainment
> =============================
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/



-- 
Best regards,
Geunsik Lim ( Samsung Electronics )
Blog : http://blog.naver.com/invain/
Homepage: http://leemgs.fedorapeople.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* Re: Android mainlining project
From: Tim Bird @ 2011-12-20 23:37 UTC (permalink / raw)
  To: Geunsik Lim; +Cc: linux-embedded, linux kernel
In-Reply-To: <CABiHV1Q2SyXX3h99tAo6Z4tmd1HUsj0KaQB5OV3qqm4Yn8P2ag@mail.gmail.com>

On 12/20/2011 03:27 PM, Geunsik Lim wrote:
> On Tue, Dec 20, 2011 at 9:39 AM, Tim Bird <tim.bird@am.sony.com> wrote:
>> The wiki page for this effort (still under construction) is
>> at: http://elinux.org/Android_Mainlining_Project
> Actually, current linux-next tree is not including ashmem(anonymous
> shared memory).
> When I checked android kernel features of linux-next(20111219),
> Wakelock API feature(./kernel/power/*lock.c) still isn't appeared in
> linux-next tree.

Both of these (wakelocks and ashmem) are listed as "not mainlined"
in the table on the wiki page.  Ashmem is apparently being
worked on (see the lwn.net article).
 -- Tim


=============================
Tim Bird
Architecture Group Chair, CE Workgroup of the Linux Foundation
Senior Staff Engineer, Sony Network Entertainment
=============================

^ permalink raw reply

* RFC: android logger feedback request
From: Tim Bird @ 2011-12-21 22:59 UTC (permalink / raw)
  To: linux-embedded
  Cc: linux kernel, Arnd Bergmann, john stultz, Greg KH, Brian Swetland

Hi all,

I'm looking for feedback on the Android logger code, to see what
it would take to make this code acceptable for inclusion in
the mainline kernel.

Information about the features of Android logging system
can be found at: http://elinux.org/Android_Logging_System

This system creates a new system-wide logging service, in
the kernel, for user-space message.  It is more comparable
to syslog than to the kernel log buffer, as it holds only
user-space messages.  It is optimized for write
performance, since most of the time the log is written to
and never read.  It creates multiple log channels, to prevent
an abundance of log messages in one channel from overwriting
messages in another channel.  The log channels have sizes
fixed at kernel compile-time.

Log messages are stored in very simple in-kernel buffers, that
overflow old messages upon wrapping.  A fixed set of attributes
(pid, tid, timestamp and message), is kept for each message.
By convention, Android puts a message priority and context tag
into each message.

In Android, this system uses a fixed set of device nodes with
well-known names: /dev/log/main, /dev/log/events, /dev/log/radio
and /dev/log/system.

Operations on the log are done via a character device, using
standard file operations and some ioctls.

The code for this is below (I've moved it from linux-next
drivers/staging/android for my own testing).

Please let me know what issues you see with this code.

One specific question I have is where is the most appropriate
place for this code to live, in the kernel source tree?
Other embedded systems might want to use this system (it
is simpler than syslog, and superior in some ways), so I don't
think it should remain in an android-specific directory.

Thanks,
 -- Tim

P.S. Sorry to do this right before the holidays.  I'll be
away from my work machine for most of the next few weeks.  I
can answer informational questions, and gather feedback,
but I won't have a lot of time for testing new code until
after the New Year.


Here's the code:
diff --git a/drivers/misc/logger.c b/drivers/misc/logger.c
new file mode 100644
index 0000000..fa76ce7
--- /dev/null
+++ b/drivers/misc/logger.c
@@ -0,0 +1,616 @@
+/*
+ * drivers/misc/logger.c
+ *
+ * A Logging Subsystem
+ *
+ * Copyright (C) 2007-2008 Google, Inc.
+ *
+ * Robert Love <rlove@google.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/sched.h>
+#include <linux/module.h>
+#include <linux/fs.h>
+#include <linux/miscdevice.h>
+#include <linux/uaccess.h>
+#include <linux/poll.h>
+#include <linux/slab.h>
+#include <linux/time.h>
+#include "logger.h"
+
+#include <asm/ioctls.h>
+
+/*
+ * struct logger_log - represents a specific log, such as 'main' or 'radio'
+ *
+ * This structure lives from module insertion until module removal, so it does
+ * not need additional reference counting. The structure is protected by the
+ * mutex 'mutex'.
+ */
+struct logger_log {
+	unsigned char 		*buffer;/* the ring buffer itself */
+	struct miscdevice	misc;	/* misc device representing the log */
+	wait_queue_head_t	wq;	/* wait queue for readers */
+	struct list_head	readers; /* this log's readers */
+	struct mutex		mutex;	/* mutex protecting buffer */
+	size_t			w_off;	/* current write head offset */
+	size_t			head;	/* new readers start here */
+	size_t			size;	/* size of the log */
+};
+
+/*
+ * struct logger_reader - a logging device open for reading
+ *
+ * This object lives from open to release, so we don't need additional
+ * reference counting. The structure is protected by log->mutex.
+ */
+struct logger_reader {
+	struct logger_log	*log;	/* associated log */
+	struct list_head	list;	/* entry in logger_log's list */
+	size_t			r_off;	/* current read head offset */
+};
+
+/* logger_offset - returns index 'n' into the log via (optimized) modulus */
+#define logger_offset(n)	((n) & (log->size - 1))
+
+/*
+ * file_get_log - Given a file structure, return the associated log
+ *
+ * This isn't aesthetic. We have several goals:
+ *
+ * 	1) Need to quickly obtain the associated log during an I/O operation
+ * 	2) Readers need to maintain state (logger_reader)
+ * 	3) Writers need to be very fast (open() should be a near no-op)
+ *
+ * In the reader case, we can trivially go file->logger_reader->logger_log.
+ * For a writer, we don't want to maintain a logger_reader, so we just go
+ * file->logger_log. Thus what file->private_data points at depends on whether
+ * or not the file was opened for reading. This function hides that dirtiness.
+ */
+static inline struct logger_log *file_get_log(struct file *file)
+{
+	if (file->f_mode & FMODE_READ) {
+		struct logger_reader *reader = file->private_data;
+		return reader->log;
+	} else
+		return file->private_data;
+}
+
+/*
+ * get_entry_len - Grabs the length of the payload of the next entry starting
+ * from 'off'.
+ *
+ * Caller needs to hold log->mutex.
+ */
+static __u32 get_entry_len(struct logger_log *log, size_t off)
+{
+	__u16 val;
+
+	switch (log->size - off) {
+	case 1:
+		memcpy(&val, log->buffer + off, 1);
+		memcpy(((char *) &val) + 1, log->buffer, 1);
+		break;
+	default:
+		memcpy(&val, log->buffer + off, 2);
+	}
+
+	return sizeof(struct logger_entry) + val;
+}
+
+/*
+ * do_read_log_to_user - reads exactly 'count' bytes from 'log' into the
+ * user-space buffer 'buf'. Returns 'count' on success.
+ *
+ * Caller must hold log->mutex.
+ */
+static ssize_t do_read_log_to_user(struct logger_log *log,
+				   struct logger_reader *reader,
+				   char __user *buf,
+				   size_t count)
+{
+	size_t len;
+
+	/*
+	 * We read from the log in two disjoint operations. First, we read from
+	 * the current read head offset up to 'count' bytes or to the end of
+	 * the log, whichever comes first.
+	 */
+	len = min(count, log->size - reader->r_off);
+	if (copy_to_user(buf, log->buffer + reader->r_off, len))
+		return -EFAULT;
+
+	/*
+	 * Second, we read any remaining bytes, starting back at the head of
+	 * the log.
+	 */
+	if (count != len)
+		if (copy_to_user(buf + len, log->buffer, count - len))
+			return -EFAULT;
+
+	reader->r_off = logger_offset(reader->r_off + count);
+
+	return count;
+}
+
+/*
+ * logger_read - our log's read() method
+ *
+ * Behavior:
+ *
+ * 	- O_NONBLOCK works
+ * 	- If there are no log entries to read, blocks until log is written to
+ * 	- Atomically reads exactly one log entry
+ *
+ * Optimal read size is LOGGER_ENTRY_MAX_LEN. Will set errno to EINVAL if read
+ * buffer is insufficient to hold next entry.
+ */
+static ssize_t logger_read(struct file *file, char __user *buf,
+			   size_t count, loff_t *pos)
+{
+	struct logger_reader *reader = file->private_data;
+	struct logger_log *log = reader->log;
+	ssize_t ret;
+	DEFINE_WAIT(wait);
+
+start:
+	while (1) {
+		prepare_to_wait(&log->wq, &wait, TASK_INTERRUPTIBLE);
+
+		mutex_lock(&log->mutex);
+		ret = (log->w_off == reader->r_off);
+		mutex_unlock(&log->mutex);
+		if (!ret)
+			break;
+
+		if (file->f_flags & O_NONBLOCK) {
+			ret = -EAGAIN;
+			break;
+		}
+
+		if (signal_pending(current)) {
+			ret = -EINTR;
+			break;
+		}
+
+		schedule();
+	}
+
+	finish_wait(&log->wq, &wait);
+	if (ret)
+		return ret;
+
+	mutex_lock(&log->mutex);
+
+	/* is there still something to read or did we race? */
+	if (unlikely(log->w_off == reader->r_off)) {
+		mutex_unlock(&log->mutex);
+		goto start;
+	}
+
+	/* get the size of the next entry */
+	ret = get_entry_len(log, reader->r_off);
+	if (count < ret) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	/* get exactly one entry from the log */
+	ret = do_read_log_to_user(log, reader, buf, ret);
+
+out:
+	mutex_unlock(&log->mutex);
+
+	return ret;
+}
+
+/*
+ * get_next_entry - return the offset of the first valid entry at least 'len'
+ * bytes after 'off'.
+ *
+ * Caller must hold log->mutex.
+ */
+static size_t get_next_entry(struct logger_log *log, size_t off, size_t len)
+{
+	size_t count = 0;
+
+	do {
+		size_t nr = get_entry_len(log, off);
+		off = logger_offset(off + nr);
+		count += nr;
+	} while (count < len);
+
+	return off;
+}
+
+/*
+ * clock_interval - is a < c < b in mod-space? Put another way, does the line
+ * from a to b cross c?
+ */
+static inline int clock_interval(size_t a, size_t b, size_t c)
+{
+	if (b < a) {
+		if (a < c || b >= c)
+			return 1;
+	} else {
+		if (a < c && b >= c)
+			return 1;
+	}
+
+	return 0;
+}
+
+/*
+ * fix_up_readers - walk the list of all readers and "fix up" any who were
+ * lapped by the writer; also do the same for the default "start head".
+ * We do this by "pulling forward" the readers and start head to the first
+ * entry after the new write head.
+ *
+ * The caller needs to hold log->mutex.
+ */
+static void fix_up_readers(struct logger_log *log, size_t len)
+{
+	size_t old = log->w_off;
+	size_t new = logger_offset(old + len);
+	struct logger_reader *reader;
+
+	if (clock_interval(old, new, log->head))
+		log->head = get_next_entry(log, log->head, len);
+
+	list_for_each_entry(reader, &log->readers, list)
+		if (clock_interval(old, new, reader->r_off))
+			reader->r_off = get_next_entry(log, reader->r_off, len);
+}
+
+/*
+ * do_write_log - writes 'len' bytes from 'buf' to 'log'
+ *
+ * The caller needs to hold log->mutex.
+ */
+static void do_write_log(struct logger_log *log, const void *buf, size_t count)
+{
+	size_t len;
+
+	len = min(count, log->size - log->w_off);
+	memcpy(log->buffer + log->w_off, buf, len);
+
+	if (count != len)
+		memcpy(log->buffer, buf + len, count - len);
+
+	log->w_off = logger_offset(log->w_off + count);
+
+}
+
+/*
+ * do_write_log_user - writes 'len' bytes from the user-space buffer 'buf' to
+ * the log 'log'
+ *
+ * The caller needs to hold log->mutex.
+ *
+ * Returns 'count' on success, negative error code on failure.
+ */
+static ssize_t do_write_log_from_user(struct logger_log *log,
+				      const void __user *buf, size_t count)
+{
+	size_t len;
+
+	len = min(count, log->size - log->w_off);
+	if (len && copy_from_user(log->buffer + log->w_off, buf, len))
+		return -EFAULT;
+
+	if (count != len)
+		if (copy_from_user(log->buffer, buf + len, count - len))
+			return -EFAULT;
+
+	log->w_off = logger_offset(log->w_off + count);
+
+	return count;
+}
+
+/*
+ * logger_aio_write - our write method, implementing support for write(),
+ * writev(), and aio_write(). Writes are our fast path, and we try to optimize
+ * them above all else.
+ */
+ssize_t logger_aio_write(struct kiocb *iocb, const struct iovec *iov,
+			 unsigned long nr_segs, loff_t ppos)
+{
+	struct logger_log *log = file_get_log(iocb->ki_filp);
+	size_t orig = log->w_off;
+	struct logger_entry header;
+	struct timespec now;
+	ssize_t ret = 0;
+
+	now = current_kernel_time();
+
+	header.pid = current->tgid;
+	header.tid = current->pid;
+	header.sec = now.tv_sec;
+	header.nsec = now.tv_nsec;
+	header.len = min_t(size_t, iocb->ki_left, LOGGER_ENTRY_MAX_PAYLOAD);
+
+	/* null writes succeed, return zero */
+	if (unlikely(!header.len))
+		return 0;
+
+	mutex_lock(&log->mutex);
+
+	/*
+	 * Fix up any readers, pulling them forward to the first readable
+	 * entry after (what will be) the new write offset. We do this now
+	 * because if we partially fail, we can end up with clobbered log
+	 * entries that encroach on readable buffer.
+	 */
+	fix_up_readers(log, sizeof(struct logger_entry) + header.len);
+
+	do_write_log(log, &header, sizeof(struct logger_entry));
+
+	while (nr_segs-- > 0) {
+		size_t len;
+		ssize_t nr;
+
+		/* figure out how much of this vector we can keep */
+		len = min_t(size_t, iov->iov_len, header.len - ret);
+
+		/* write out this segment's payload */
+		nr = do_write_log_from_user(log, iov->iov_base, len);
+		if (unlikely(nr < 0)) {
+			log->w_off = orig;
+			mutex_unlock(&log->mutex);
+			return nr;
+		}
+
+		iov++;
+		ret += nr;
+	}
+
+	mutex_unlock(&log->mutex);
+
+	/* wake up any blocked readers */
+	wake_up_interruptible(&log->wq);
+
+	return ret;
+}
+
+static struct logger_log *get_log_from_minor(int);
+
+/*
+ * logger_open - the log's open() file operation
+ *
+ * Note how near a no-op this is in the write-only case. Keep it that way!
+ */
+static int logger_open(struct inode *inode, struct file *file)
+{
+	struct logger_log *log;
+	int ret;
+
+	ret = nonseekable_open(inode, file);
+	if (ret)
+		return ret;
+
+	log = get_log_from_minor(MINOR(inode->i_rdev));
+	if (!log)
+		return -ENODEV;
+
+	if (file->f_mode & FMODE_READ) {
+		struct logger_reader *reader;
+
+		reader = kmalloc(sizeof(struct logger_reader), GFP_KERNEL);
+		if (!reader)
+			return -ENOMEM;
+
+		reader->log = log;
+		INIT_LIST_HEAD(&reader->list);
+
+		mutex_lock(&log->mutex);
+		reader->r_off = log->head;
+		list_add_tail(&reader->list, &log->readers);
+		mutex_unlock(&log->mutex);
+
+		file->private_data = reader;
+	} else
+		file->private_data = log;
+
+	return 0;
+}
+
+/*
+ * logger_release - the log's release file operation
+ *
+ * Note this is a total no-op in the write-only case. Keep it that way!
+ */
+static int logger_release(struct inode *ignored, struct file *file)
+{
+	if (file->f_mode & FMODE_READ) {
+		struct logger_reader *reader = file->private_data;
+		list_del(&reader->list);
+		kfree(reader);
+	}
+
+	return 0;
+}
+
+/*
+ * logger_poll - the log's poll file operation, for poll/select/epoll
+ *
+ * Note we always return POLLOUT, because you can always write() to the log.
+ * Note also that, strictly speaking, a return value of POLLIN does not
+ * guarantee that the log is readable without blocking, as there is a small
+ * chance that the writer can lap the reader in the interim between poll()
+ * returning and the read() request.
+ */
+static unsigned int logger_poll(struct file *file, poll_table *wait)
+{
+	struct logger_reader *reader;
+	struct logger_log *log;
+	unsigned int ret = POLLOUT | POLLWRNORM;
+
+	if (!(file->f_mode & FMODE_READ))
+		return ret;
+
+	reader = file->private_data;
+	log = reader->log;
+
+	poll_wait(file, &log->wq, wait);
+
+	mutex_lock(&log->mutex);
+	if (log->w_off != reader->r_off)
+		ret |= POLLIN | POLLRDNORM;
+	mutex_unlock(&log->mutex);
+
+	return ret;
+}
+
+static long logger_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+	struct logger_log *log = file_get_log(file);
+	struct logger_reader *reader;
+	long ret = -ENOTTY;
+
+	mutex_lock(&log->mutex);
+
+	switch (cmd) {
+	case LOGGER_GET_LOG_BUF_SIZE:
+		ret = log->size;
+		break;
+	case LOGGER_GET_LOG_LEN:
+		if (!(file->f_mode & FMODE_READ)) {
+			ret = -EBADF;
+			break;
+		}
+		reader = file->private_data;
+		if (log->w_off >= reader->r_off)
+			ret = log->w_off - reader->r_off;
+		else
+			ret = (log->size - reader->r_off) + log->w_off;
+		break;
+	case LOGGER_GET_NEXT_ENTRY_LEN:
+		if (!(file->f_mode & FMODE_READ)) {
+			ret = -EBADF;
+			break;
+		}
+		reader = file->private_data;
+		if (log->w_off != reader->r_off)
+			ret = get_entry_len(log, reader->r_off);
+		else
+			ret = 0;
+		break;
+	case LOGGER_FLUSH_LOG:
+		if (!(file->f_mode & FMODE_WRITE)) {
+			ret = -EBADF;
+			break;
+		}
+		list_for_each_entry(reader, &log->readers, list)
+			reader->r_off = log->w_off;
+		log->head = log->w_off;
+		ret = 0;
+		break;
+	}
+
+	mutex_unlock(&log->mutex);
+
+	return ret;
+}
+
+static const struct file_operations logger_fops = {
+	.owner = THIS_MODULE,
+	.read = logger_read,
+	.aio_write = logger_aio_write,
+	.poll = logger_poll,
+	.unlocked_ioctl = logger_ioctl,
+	.compat_ioctl = logger_ioctl,
+	.open = logger_open,
+	.release = logger_release,
+};
+
+/*
+ * Defines a log structure with name 'NAME' and a size of 'SIZE' bytes, which
+ * must be a power of two, greater than LOGGER_ENTRY_MAX_LEN, and less than
+ * LONG_MAX minus LOGGER_ENTRY_MAX_LEN.
+ */
+#define DEFINE_LOGGER_DEVICE(VAR, NAME, SIZE) \
+static unsigned char _buf_ ## VAR[SIZE]; \
+static struct logger_log VAR = { \
+	.buffer = _buf_ ## VAR, \
+	.misc = { \
+		.minor = MISC_DYNAMIC_MINOR, \
+		.name = NAME, \
+		.fops = &logger_fops, \
+		.parent = NULL, \
+	}, \
+	.wq = __WAIT_QUEUE_HEAD_INITIALIZER(VAR .wq), \
+	.readers = LIST_HEAD_INIT(VAR .readers), \
+	.mutex = __MUTEX_INITIALIZER(VAR .mutex), \
+	.w_off = 0, \
+	.head = 0, \
+	.size = SIZE, \
+};
+
+DEFINE_LOGGER_DEVICE(log_main, LOGGER_LOG_MAIN, 256*1024)
+DEFINE_LOGGER_DEVICE(log_events, LOGGER_LOG_EVENTS, 256*1024)
+DEFINE_LOGGER_DEVICE(log_radio, LOGGER_LOG_RADIO, 256*1024)
+DEFINE_LOGGER_DEVICE(log_system, LOGGER_LOG_SYSTEM, 256*1024)
+
+static struct logger_log *get_log_from_minor(int minor)
+{
+	if (log_main.misc.minor == minor)
+		return &log_main;
+	if (log_events.misc.minor == minor)
+		return &log_events;
+	if (log_radio.misc.minor == minor)
+		return &log_radio;
+	if (log_system.misc.minor == minor)
+		return &log_system;
+	return NULL;
+}
+
+static int __init init_log(struct logger_log *log)
+{
+	int ret;
+
+	ret = misc_register(&log->misc);
+	if (unlikely(ret)) {
+		printk(KERN_ERR "logger: failed to register misc "
+		       "device for log '%s'!\n", log->misc.name);
+		return ret;
+	}
+
+	printk(KERN_INFO "logger: created %luK log '%s'\n",
+	       (unsigned long) log->size >> 10, log->misc.name);
+
+	return 0;
+}
+
+static int __init logger_init(void)
+{
+	int ret;
+
+	ret = init_log(&log_main);
+	if (unlikely(ret))
+		goto out;
+
+	ret = init_log(&log_events);
+	if (unlikely(ret))
+		goto out;
+
+	ret = init_log(&log_radio);
+	if (unlikely(ret))
+		goto out;
+
+	ret = init_log(&log_system);
+	if (unlikely(ret))
+		goto out;
+
+out:
+	return ret;
+}
+device_initcall(logger_init);
diff --git a/drivers/misc/logger.h b/drivers/misc/logger.h
new file mode 100644
index 0000000..2cb06e9
--- /dev/null
+++ b/drivers/misc/logger.h
@@ -0,0 +1,49 @@
+/* include/linux/logger.h
+ *
+ * Copyright (C) 2007-2008 Google, Inc.
+ * Author: Robert Love <rlove@android.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _LINUX_LOGGER_H
+#define _LINUX_LOGGER_H
+
+#include <linux/types.h>
+#include <linux/ioctl.h>
+
+struct logger_entry {
+	__u16		len;	/* length of the payload */
+	__u16		__pad;	/* no matter what, we get 2 bytes of padding */
+	__s32		pid;	/* generating process's pid */
+	__s32		tid;	/* generating process's tid */
+	__s32		sec;	/* seconds since Epoch */
+	__s32		nsec;	/* nanoseconds */
+	char		msg[0];	/* the entry's payload */
+};
+
+#define LOGGER_LOG_RADIO	"log_radio"	/* radio-related messages */
+#define LOGGER_LOG_EVENTS	"log_events"	/* system/hardware events */
+#define LOGGER_LOG_SYSTEM	"log_system"	/* system/framework messages */
+#define LOGGER_LOG_MAIN		"log_main"	/* everything else */
+
+#define LOGGER_ENTRY_MAX_LEN		(4*1024)
+#define LOGGER_ENTRY_MAX_PAYLOAD	\
+	(LOGGER_ENTRY_MAX_LEN - sizeof(struct logger_entry))
+
+#define __LOGGERIO	0xAE
+
+#define LOGGER_GET_LOG_BUF_SIZE		_IO(__LOGGERIO, 1) /* size of log */
+#define LOGGER_GET_LOG_LEN		_IO(__LOGGERIO, 2) /* used log len */
+#define LOGGER_GET_NEXT_ENTRY_LEN	_IO(__LOGGERIO, 3) /* next entry len */
+#define LOGGER_FLUSH_LOG		_IO(__LOGGERIO, 4) /* flush log */
+
+#endif /* _LINUX_LOGGER_H */

^ permalink raw reply related

* Re: RFC: android logger feedback request
From: Greg KH @ 2011-12-21 23:19 UTC (permalink / raw)
  To: Tim Bird
  Cc: linux-embedded, linux kernel, Arnd Bergmann, john stultz,
	Brian Swetland, Kay Sievers, Lennart Poettering
In-Reply-To: <4EF264C3.6000104@am.sony.com>

On Wed, Dec 21, 2011 at 02:59:15PM -0800, Tim Bird wrote:
> Hi all,
> 
> I'm looking for feedback on the Android logger code, to see what
> it would take to make this code acceptable for inclusion in
> the mainline kernel.
> 
> Information about the features of Android logging system
> can be found at: http://elinux.org/Android_Logging_System
> 
> This system creates a new system-wide logging service, in
> the kernel, for user-space message.  It is more comparable
> to syslog than to the kernel log buffer, as it holds only
> user-space messages.  It is optimized for write
> performance, since most of the time the log is written to
> and never read.  It creates multiple log channels, to prevent
> an abundance of log messages in one channel from overwriting
> messages in another channel.  The log channels have sizes
> fixed at kernel compile-time.
> 
> Log messages are stored in very simple in-kernel buffers, that
> overflow old messages upon wrapping.  A fixed set of attributes
> (pid, tid, timestamp and message), is kept for each message.
> By convention, Android puts a message priority and context tag
> into each message.
> 
> In Android, this system uses a fixed set of device nodes with
> well-known names: /dev/log/main, /dev/log/events, /dev/log/radio
> and /dev/log/system.
> 
> Operations on the log are done via a character device, using
> standard file operations and some ioctls.
> 
> The code for this is below (I've moved it from linux-next
> drivers/staging/android for my own testing).
> 
> Please let me know what issues you see with this code.

That all describes the current code, but you haven't described what's
wrong with the existing syslog interface that requires this new driver
to be written.  And why can't the existing interface be fixed to address
these (potential) shortcomings?

> One specific question I have is where is the most appropriate
> place for this code to live, in the kernel source tree?
> Other embedded systems might want to use this system (it
> is simpler than syslog, and superior in some ways), so I don't
> think it should remain in an android-specific directory.

What way is it superior?  Again, why not extend syslog?  Why not "fix"
syslog if this really is a superior thing?  How does this tie into Kay
and Lennard's proposal for work in this area?

thanks,

greg k-h

^ permalink raw reply

* Re: RFC: android logger feedback request
From: john stultz @ 2011-12-22  0:18 UTC (permalink / raw)
  To: Greg KH
  Cc: Tim Bird, linux-embedded, linux kernel, Arnd Bergmann,
	Brian Swetland, Kay Sievers, Lennart Poettering
In-Reply-To: <20111221231956.GB23859@suse.de>

On Wed, 2011-12-21 at 15:19 -0800, Greg KH wrote:
> On Wed, Dec 21, 2011 at 02:59:15PM -0800, Tim Bird wrote:
> > Hi all,
> > 
> > I'm looking for feedback on the Android logger code, to see what
> > it would take to make this code acceptable for inclusion in
> > the mainline kernel.
> > 
> > Information about the features of Android logging system
> > can be found at: http://elinux.org/Android_Logging_System
> > 
> > This system creates a new system-wide logging service, in
> > the kernel, for user-space message.  It is more comparable
> > to syslog than to the kernel log buffer, as it holds only
> > user-space messages.  It is optimized for write
> > performance, since most of the time the log is written to
> > and never read.  It creates multiple log channels, to prevent
> > an abundance of log messages in one channel from overwriting
> > messages in another channel.  The log channels have sizes
> > fixed at kernel compile-time.
> > 
> > Log messages are stored in very simple in-kernel buffers, that
> > overflow old messages upon wrapping.  A fixed set of attributes
> > (pid, tid, timestamp and message), is kept for each message.
> > By convention, Android puts a message priority and context tag
> > into each message.
> > 
> > In Android, this system uses a fixed set of device nodes with
> > well-known names: /dev/log/main, /dev/log/events, /dev/log/radio
> > and /dev/log/system.
> > 
> > Operations on the log are done via a character device, using
> > standard file operations and some ioctls.
> > 
> > The code for this is below (I've moved it from linux-next
> > drivers/staging/android for my own testing).
> > 
> > Please let me know what issues you see with this code.
> 
> That all describes the current code, but you haven't described what's
> wrong with the existing syslog interface that requires this new driver
> to be written.  And why can't the existing interface be fixed to address
> these (potential) shortcomings?
> 
> > One specific question I have is where is the most appropriate
> > place for this code to live, in the kernel source tree?
> > Other embedded systems might want to use this system (it
> > is simpler than syslog, and superior in some ways), so I don't
> > think it should remain in an android-specific directory.
> 
> What way is it superior?  Again, why not extend syslog?  Why not "fix"
> syslog if this really is a superior thing?  How does this tie into Kay
> and Lennard's proposal for work in this area?

There is also some overlap functionality wise with pstore as well, as I
believe the logger is used as a known location in memory where messages
can be fetched from after a kernel panic or crash.

thanks
-john

^ permalink raw reply

* Re: RFC: android logger feedback request
From: Greg KH @ 2011-12-22  0:27 UTC (permalink / raw)
  To: john stultz
  Cc: Tim Bird, linux-embedded, linux kernel, Arnd Bergmann,
	Brian Swetland, Kay Sievers, Lennart Poettering
In-Reply-To: <1324513091.30527.139.camel@work-vm>

On Wed, Dec 21, 2011 at 04:18:11PM -0800, john stultz wrote:
> On Wed, 2011-12-21 at 15:19 -0800, Greg KH wrote:
> > On Wed, Dec 21, 2011 at 02:59:15PM -0800, Tim Bird wrote:
> > > Hi all,
> > > 
> > > I'm looking for feedback on the Android logger code, to see what
> > > it would take to make this code acceptable for inclusion in
> > > the mainline kernel.
> > > 
> > > Information about the features of Android logging system
> > > can be found at: http://elinux.org/Android_Logging_System
> > > 
> > > This system creates a new system-wide logging service, in
> > > the kernel, for user-space message.  It is more comparable
> > > to syslog than to the kernel log buffer, as it holds only
> > > user-space messages.  It is optimized for write
> > > performance, since most of the time the log is written to
> > > and never read.  It creates multiple log channels, to prevent
> > > an abundance of log messages in one channel from overwriting
> > > messages in another channel.  The log channels have sizes
> > > fixed at kernel compile-time.
> > > 
> > > Log messages are stored in very simple in-kernel buffers, that
> > > overflow old messages upon wrapping.  A fixed set of attributes
> > > (pid, tid, timestamp and message), is kept for each message.
> > > By convention, Android puts a message priority and context tag
> > > into each message.
> > > 
> > > In Android, this system uses a fixed set of device nodes with
> > > well-known names: /dev/log/main, /dev/log/events, /dev/log/radio
> > > and /dev/log/system.
> > > 
> > > Operations on the log are done via a character device, using
> > > standard file operations and some ioctls.
> > > 
> > > The code for this is below (I've moved it from linux-next
> > > drivers/staging/android for my own testing).
> > > 
> > > Please let me know what issues you see with this code.
> > 
> > That all describes the current code, but you haven't described what's
> > wrong with the existing syslog interface that requires this new driver
> > to be written.  And why can't the existing interface be fixed to address
> > these (potential) shortcomings?
> > 
> > > One specific question I have is where is the most appropriate
> > > place for this code to live, in the kernel source tree?
> > > Other embedded systems might want to use this system (it
> > > is simpler than syslog, and superior in some ways), so I don't
> > > think it should remain in an android-specific directory.
> > 
> > What way is it superior?  Again, why not extend syslog?  Why not "fix"
> > syslog if this really is a superior thing?  How does this tie into Kay
> > and Lennard's proposal for work in this area?
> 
> There is also some overlap functionality wise with pstore as well, as I
> believe the logger is used as a known location in memory where messages
> can be fetched from after a kernel panic or crash.

That sounds like a major overlap, can't pstore be used for this
functionality today instead of the logger code?

greg k-h

^ permalink raw reply

* Re: RFC: android logger feedback request
From: Tim Bird @ 2011-12-22  0:36 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-embedded, linux kernel, Arnd Bergmann, john stultz,
	Brian Swetland, Kay Sievers, Lennart Poettering
In-Reply-To: <20111221231956.GB23859@suse.de>

On 12/21/2011 03:19 PM, Greg KH wrote:
> That all describes the current code, but you haven't described what's
> wrong with the existing syslog interface that requires this new driver
> to be written.  And why can't the existing interface be fixed to address
> these (potential) shortcomings?


>> One specific question I have is where is the most appropriate
>> place for this code to live, in the kernel source tree?
>> Other embedded systems might want to use this system (it
>> is simpler than syslog, and superior in some ways), so I don't
>> think it should remain in an android-specific directory.
> 
> What way is it superior?

Here are some ways that this code is superior to syslog:

Speed and overhead
------------------
syslogd requires a separate user-space process to manage
the log area, where this code does not.  The overhead
for any user-space process is at least 16K, and usually much
more than this (not including the size of the log storage
area).  On one of my embedded systems, where syslogd is
provided by busybox, the unique set size for syslogd
is 96K.  This code, built in to the Linux kernel is less
than 4K code and data (again, not including the size of
the log storage area).

To deliver a message to syslog requires a socket operation
and a few context switches.  With the logger code,
the operation is a file operation (writev) to kernel memory,
with only one context switch into and out of the kernel.

The open and write paths through the Linux kernel are
arguably more optimized than the networking paths.
This is a consideration since the log operations should
optimized for the "create-a-log-entry" case (the open/write
path), since logs are mostly written and almost never read
on production devices.

No dependence on persistent storage
-----------------------------------
syslogd requires either persistent storage to store the log,
or a network connection to an outside device.  Being
purely memory-based, the logger requires neither of these.
With logger, persistence of the log data is left to the
implementor.  In Android, the data is delivered over a USB
connection via adb or to the console as ascii text, using
logcat.  In other embedded systems, other mechanisms might
be used if long-term storage of the messages is desired.
With logger, there is no automatic notion of on-device
persistent storage for the log data.

No dependence on networking kernel code
---------------------------------------
The syslog communication mechanism requires sockets.  This
prevents one from configuring the kernel with no networking
support, which is sometimes done in embedded systems to save
size.

Simpler constraint on log size
------------------------------
The busybox syslog daemon uses a log rotation feature to constrain
the size of the log in persistent storage.  This is overly
cumbersome in both speed and complexity compared to the logger's
simple ring buffer.

Licensing
---------
The code implementing library and command line tool support
for this logger (in user space) is available under an Apache license,
rather than a GPL license, which is desirable for some vendors.

> Again, why not extend syslog?  Why not "fix"
> syslog if this really is a superior thing?

"extend" syslog would not really the the right
direction.  This system is simpler than syslog,
while simultaneously having at least one valuable
extra feature (separate log channels).

syslog has a standard set of interfaces in libc
and various syslogd implementations, which are
heavier weight in nature than what is provided here.
It is unclear that an attempt at reducing these attributes
(such as memory overhead, number of context switches,
dependence on persistent storage, and socket utilization) would
yield a system substantially different from the logger.

> How does this tie into Kay
> and Lennard's proposal for work in this area?

It does not tie in at all.

Kay and Lennart's proposal for "the Journal" creates
a more complex system than syslog, and handles a number
of new interesting use cases.  This system is on the
opposite side of the spectrum from the journal, towards
simplicity and reduced footprint and overhead.

 -- Tim

=============================
Tim Bird
Architecture Group Chair, CE Workgroup of the Linux Foundation
Senior Staff Engineer, Sony Network Entertainment
=============================

^ permalink raw reply

* Re: RFC: android logger feedback request
From: Tim Bird @ 2011-12-22  0:42 UTC (permalink / raw)
  To: john stultz
  Cc: Greg KH, linux-embedded, linux kernel, Arnd Bergmann,
	Brian Swetland, Kay Sievers, Lennart Poettering
In-Reply-To: <1324513091.30527.139.camel@work-vm>

On 12/21/2011 04:18 PM, john stultz wrote:
> On Wed, 2011-12-21 at 15:19 -0800, Greg KH wrote:
>> On Wed, Dec 21, 2011 at 02:59:15PM -0800, Tim Bird wrote:
>>> Hi all,
>>>
>>> I'm looking for feedback on the Android logger code, to see what
>>> it would take to make this code acceptable for inclusion in
>>> the mainline kernel.
>>>
>>> Information about the features of Android logging system
>>> can be found at: http://elinux.org/Android_Logging_System
>>>
>>> This system creates a new system-wide logging service, in
>>> the kernel, for user-space message.  It is more comparable
>>> to syslog than to the kernel log buffer, as it holds only
>>> user-space messages.  It is optimized for write
>>> performance, since most of the time the log is written to
>>> and never read.  It creates multiple log channels, to prevent
>>> an abundance of log messages in one channel from overwriting
>>> messages in another channel.  The log channels have sizes
>>> fixed at kernel compile-time.
>>>
>>> Log messages are stored in very simple in-kernel buffers, that
>>> overflow old messages upon wrapping.  A fixed set of attributes
>>> (pid, tid, timestamp and message), is kept for each message.
>>> By convention, Android puts a message priority and context tag
>>> into each message.
>>>
>>> In Android, this system uses a fixed set of device nodes with
>>> well-known names: /dev/log/main, /dev/log/events, /dev/log/radio
>>> and /dev/log/system.
>>>
>>> Operations on the log are done via a character device, using
>>> standard file operations and some ioctls.
>>>
>>> The code for this is below (I've moved it from linux-next
>>> drivers/staging/android for my own testing).
>>>
>>> Please let me know what issues you see with this code.
>>
>> That all describes the current code, but you haven't described what's
>> wrong with the existing syslog interface that requires this new driver
>> to be written.  And why can't the existing interface be fixed to address
>> these (potential) shortcomings?
>>
>>> One specific question I have is where is the most appropriate
>>> place for this code to live, in the kernel source tree?
>>> Other embedded systems might want to use this system (it
>>> is simpler than syslog, and superior in some ways), so I don't
>>> think it should remain in an android-specific directory.
>>
>> What way is it superior?  Again, why not extend syslog?  Why not "fix"
>> syslog if this really is a superior thing?  How does this tie into Kay
>> and Lennard's proposal for work in this area?
> 
> There is also some overlap functionality wise with pstore as well, as I
> believe the logger is used as a known location in memory where messages
> can be fetched from after a kernel panic or crash.

I don't know if that's true or not.  I think you may be thinking
of Android's RAM console feature.  If there's a way to save
application messages over a reboot using the logger buffers, I'm
unfamiliar with it.
 -- Tim

=============================
Tim Bird
Architecture Group Chair, CE Workgroup of the Linux Foundation
Senior Staff Engineer, Sony Network Entertainment
=============================

^ permalink raw reply

* Re: RFC: android logger feedback request
From: john stultz @ 2011-12-22  0:47 UTC (permalink / raw)
  To: Greg KH
  Cc: Tim Bird, linux-embedded, linux kernel, Arnd Bergmann,
	Brian Swetland, Kay Sievers, Lennart Poettering
In-Reply-To: <20111222002719.GA18361@suse.de>

On Wed, 2011-12-21 at 16:27 -0800, Greg KH wrote:
> On Wed, Dec 21, 2011 at 04:18:11PM -0800, john stultz wrote:
> > On Wed, 2011-12-21 at 15:19 -0800, Greg KH wrote:
> > > What way is it superior?  Again, why not extend syslog?  Why not "fix"
> > > syslog if this really is a superior thing?  How does this tie into Kay
> > > and Lennard's proposal for work in this area?
> > 
> > There is also some overlap functionality wise with pstore as well, as I
> > believe the logger is used as a known location in memory where messages
> > can be fetched from after a kernel panic or crash.
> 
> That sounds like a major overlap, can't pstore be used for this
> functionality today instead of the logger code?

It may be able to suffice for some portion of the functionality, it was
suggested that we consider doing something like try to make logger use
pstore as a backend.  I'm not really familiar with pstore, but I don't
know if its able to preserve messages from userland, so it may not
really be sufficient in of itself.

Also, if I understand correctly, logger also is able to store messages
from hardware modems or other hardware that run their own micro-os in a
central location. This may be closer to the pstore mce bits, but its not
clear yet to me how much overlap is there.

But I've not yet really spent too much time looking at logger yet, so I
can't really comment too strongly here. Just wanted to make sure folks
have looked at all the other similar work.

thanks
-john

^ permalink raw reply

* Re: RFC: android logger feedback request
From: john stultz @ 2011-12-22  0:49 UTC (permalink / raw)
  To: Tim Bird
  Cc: Greg KH, linux-embedded, linux kernel, Arnd Bergmann,
	Brian Swetland, Kay Sievers, Lennart Poettering
In-Reply-To: <4EF27CDD.9060801@am.sony.com>

On Wed, 2011-12-21 at 16:42 -0800, Tim Bird wrote:
> On 12/21/2011 04:18 PM, john stultz wrote:
> > On Wed, 2011-12-21 at 15:19 -0800, Greg KH wrote:
> >> On Wed, Dec 21, 2011 at 02:59:15PM -0800, Tim Bird wrote:
> >>> Hi all,
> >>>
> >>> I'm looking for feedback on the Android logger code, to see what
> >>> it would take to make this code acceptable for inclusion in
> >>> the mainline kernel.
> >>>
> >>> Information about the features of Android logging system
> >>> can be found at: http://elinux.org/Android_Logging_System
> >>>
> >>> This system creates a new system-wide logging service, in
> >>> the kernel, for user-space message.  It is more comparable
> >>> to syslog than to the kernel log buffer, as it holds only
> >>> user-space messages.  It is optimized for write
> >>> performance, since most of the time the log is written to
> >>> and never read.  It creates multiple log channels, to prevent
> >>> an abundance of log messages in one channel from overwriting
> >>> messages in another channel.  The log channels have sizes
> >>> fixed at kernel compile-time.
> >>>
> >>> Log messages are stored in very simple in-kernel buffers, that
> >>> overflow old messages upon wrapping.  A fixed set of attributes
> >>> (pid, tid, timestamp and message), is kept for each message.
> >>> By convention, Android puts a message priority and context tag
> >>> into each message.
> >>>
> >>> In Android, this system uses a fixed set of device nodes with
> >>> well-known names: /dev/log/main, /dev/log/events, /dev/log/radio
> >>> and /dev/log/system.
> >>>
> >>> Operations on the log are done via a character device, using
> >>> standard file operations and some ioctls.
> >>>
> >>> The code for this is below (I've moved it from linux-next
> >>> drivers/staging/android for my own testing).
> >>>
> >>> Please let me know what issues you see with this code.
> >>
> >> That all describes the current code, but you haven't described what's
> >> wrong with the existing syslog interface that requires this new driver
> >> to be written.  And why can't the existing interface be fixed to address
> >> these (potential) shortcomings?
> >>
> >>> One specific question I have is where is the most appropriate
> >>> place for this code to live, in the kernel source tree?
> >>> Other embedded systems might want to use this system (it
> >>> is simpler than syslog, and superior in some ways), so I don't
> >>> think it should remain in an android-specific directory.
> >>
> >> What way is it superior?  Again, why not extend syslog?  Why not "fix"
> >> syslog if this really is a superior thing?  How does this tie into Kay
> >> and Lennard's proposal for work in this area?
> > 
> > There is also some overlap functionality wise with pstore as well, as I
> > believe the logger is used as a known location in memory where messages
> > can be fetched from after a kernel panic or crash.
> 
> I don't know if that's true or not.  I think you may be thinking
> of Android's RAM console feature.  If there's a way to save
> application messages over a reboot using the logger buffers, I'm
> unfamiliar with it.

I may very well be confusing things. I had thought apanic pulled the
logger data, but maybe I'm wrong.

thanks
-john


^ permalink raw reply

* Re: RFC: android logger feedback request
From: Greg KH @ 2011-12-22  0:51 UTC (permalink / raw)
  To: Tim Bird
  Cc: linux-embedded, linux kernel, Arnd Bergmann, john stultz,
	Brian Swetland, Kay Sievers, Lennart Poettering
In-Reply-To: <4EF27B85.9080801@am.sony.com>

On Wed, Dec 21, 2011 at 04:36:21PM -0800, Tim Bird wrote:
> On 12/21/2011 03:19 PM, Greg KH wrote:
> > That all describes the current code, but you haven't described what's
> > wrong with the existing syslog interface that requires this new driver
> > to be written.  And why can't the existing interface be fixed to address
> > these (potential) shortcomings?
> 
> 
> >> One specific question I have is where is the most appropriate
> >> place for this code to live, in the kernel source tree?
> >> Other embedded systems might want to use this system (it
> >> is simpler than syslog, and superior in some ways), so I don't
> >> think it should remain in an android-specific directory.
> > 
> > What way is it superior?
> 
> Here are some ways that this code is superior to syslog:
> 
> Speed and overhead
> ------------------
> syslogd requires a separate user-space process to manage
> the log area, where this code does not.  The overhead
> for any user-space process is at least 16K, and usually much
> more than this (not including the size of the log storage
> area).  On one of my embedded systems, where syslogd is
> provided by busybox, the unique set size for syslogd
> is 96K.  This code, built in to the Linux kernel is less
> than 4K code and data (again, not including the size of
> the log storage area).

Huh, I'm not talking about syslogd, I'm talking about the syslog(2)
syscall we have.

This character interface seems very close to the syslog(2) api, but just
done in a character interface, with ioctls, which also require userspace
tools to manage properly, so I fail to see the big "gain" here.

What am I missing?

> To deliver a message to syslog requires a socket operation
> and a few context switches.

For syslog(2)?  What socket?

> With the logger code,
> the operation is a file operation (writev) to kernel memory,
> with only one context switch into and out of the kernel.

I think we are thinking of different apis here...

> No dependence on persistent storage
> -----------------------------------
> syslogd requires either persistent storage to store the log,
> or a network connection to an outside device.  Being
> purely memory-based, the logger requires neither of these.
> With logger, persistence of the log data is left to the
> implementor.  In Android, the data is delivered over a USB
> connection via adb or to the console as ascii text, using
> logcat.  In other embedded systems, other mechanisms might
> be used if long-term storage of the messages is desired.
> With logger, there is no automatic notion of on-device
> persistent storage for the log data.
> 
> No dependence on networking kernel code
> ---------------------------------------
> The syslog communication mechanism requires sockets.  This
> prevents one from configuring the kernel with no networking
> support, which is sometimes done in embedded systems to save
> size.
> 
> Simpler constraint on log size
> ------------------------------
> The busybox syslog daemon uses a log rotation feature to constrain
> the size of the log in persistent storage.  This is overly
> cumbersome in both speed and complexity compared to the logger's
> simple ring buffer.
> 
> Licensing
> ---------
> The code implementing library and command line tool support
> for this logger (in user space) is available under an Apache license,
> rather than a GPL license, which is desirable for some vendors.

Ok, care to rewrite all of this when thinking of syslog(2), the kernel
system call, and not syslogd, which is what I was not comparing this
kernel driver to at all?

> > How does this tie into Kay
> > and Lennard's proposal for work in this area?
> 
> It does not tie in at all.
> 
> Kay and Lennart's proposal for "the Journal" creates
> a more complex system than syslog, and handles a number
> of new interesting use cases.  This system is on the
> opposite side of the spectrum from the journal, towards
> simplicity and reduced footprint and overhead.

No, it's much the same, for the kernel side, which is what I am
referring to here.

greg k-h

^ permalink raw reply

* Re: RFC: android logger feedback request
From: David Brown @ 2011-12-22  0:59 UTC (permalink / raw)
  To: Tim Bird
  Cc: linux-embedded, linux kernel, Arnd Bergmann, john stultz, Greg KH,
	Brian Swetland
In-Reply-To: <4EF264C3.6000104@am.sony.com>

On Wed, Dec 21, 2011 at 02:59:15PM -0800, Tim Bird wrote:

> In Android, this system uses a fixed set of device nodes with
> well-known names: /dev/log/main, /dev/log/events, /dev/log/radio
> and /dev/log/system.

These names seem very specific to Android's use case.  Would we want
the mechanism to be more general, or configurable, so that another
embedded-type system would be able to have their own log types.

But, the biggest question I have is to understand why this is a kernel
driver.  In essence, it is just shuttling data from various processes
to something that eventually needs to read that data.  In other words,
it is doing a subset of what syslog does.  If the concern is about a
userspace program crashing, wouldn't the userspace tool that reads
this data also be able to crash?

The driver clearly has no kernel API, since it exports no symbols.  I
could see more of an argument for it if certain kernel things were
able to write to the log, but then we are also duplicating other
functionality.

David

-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply

* Re: RFC: android logger feedback request
From: john stultz @ 2011-12-22  1:00 UTC (permalink / raw)
  To: Tim Bird
  Cc: Greg KH, linux-embedded, linux kernel, Arnd Bergmann,
	Brian Swetland, Kay Sievers, Lennart Poettering
In-Reply-To: <1324514996.30527.150.camel@work-vm>

On Wed, 2011-12-21 at 16:49 -0800, john stultz wrote:
> On Wed, 2011-12-21 at 16:42 -0800, Tim Bird wrote:
> > On 12/21/2011 04:18 PM, john stultz wrote:
> > > On Wed, 2011-12-21 at 15:19 -0800, Greg KH wrote:
> > >> On Wed, Dec 21, 2011 at 02:59:15PM -0800, Tim Bird wrote:
> > >>> Hi all,
> > >>>
> > >>> I'm looking for feedback on the Android logger code, to see what
> > >>> it would take to make this code acceptable for inclusion in
> > >>> the mainline kernel.
> > >>>
> > >>> Information about the features of Android logging system
> > >>> can be found at: http://elinux.org/Android_Logging_System
> > >>>
> > >>> This system creates a new system-wide logging service, in
> > >>> the kernel, for user-space message.  It is more comparable
> > >>> to syslog than to the kernel log buffer, as it holds only
> > >>> user-space messages.  It is optimized for write
> > >>> performance, since most of the time the log is written to
> > >>> and never read.  It creates multiple log channels, to prevent
> > >>> an abundance of log messages in one channel from overwriting
> > >>> messages in another channel.  The log channels have sizes
> > >>> fixed at kernel compile-time.
> > >>>
> > >>> Log messages are stored in very simple in-kernel buffers, that
> > >>> overflow old messages upon wrapping.  A fixed set of attributes
> > >>> (pid, tid, timestamp and message), is kept for each message.
> > >>> By convention, Android puts a message priority and context tag
> > >>> into each message.
> > >>>
> > >>> In Android, this system uses a fixed set of device nodes with
> > >>> well-known names: /dev/log/main, /dev/log/events, /dev/log/radio
> > >>> and /dev/log/system.
> > >>>
> > >>> Operations on the log are done via a character device, using
> > >>> standard file operations and some ioctls.
> > >>>
> > >>> The code for this is below (I've moved it from linux-next
> > >>> drivers/staging/android for my own testing).
> > >>>
> > >>> Please let me know what issues you see with this code.
> > >>
> > >> That all describes the current code, but you haven't described what's
> > >> wrong with the existing syslog interface that requires this new driver
> > >> to be written.  And why can't the existing interface be fixed to address
> > >> these (potential) shortcomings?
> > >>
> > >>> One specific question I have is where is the most appropriate
> > >>> place for this code to live, in the kernel source tree?
> > >>> Other embedded systems might want to use this system (it
> > >>> is simpler than syslog, and superior in some ways), so I don't
> > >>> think it should remain in an android-specific directory.
> > >>
> > >> What way is it superior?  Again, why not extend syslog?  Why not "fix"
> > >> syslog if this really is a superior thing?  How does this tie into Kay
> > >> and Lennard's proposal for work in this area?
> > > 
> > > There is also some overlap functionality wise with pstore as well, as I
> > > believe the logger is used as a known location in memory where messages
> > > can be fetched from after a kernel panic or crash.
> > 
> > I don't know if that's true or not.  I think you may be thinking
> > of Android's RAM console feature.  If there's a way to save
> > application messages over a reboot using the logger buffers, I'm
> > unfamiliar with it.
> 
> I may very well be confusing things. I had thought apanic pulled the
> logger data, but maybe I'm wrong.

Looks like I am wrong. Apologies. Thanks for pointing my error out.
-john

^ permalink raw reply

* Re: RFC: android logger feedback request
From: john stultz @ 2011-12-22  1:09 UTC (permalink / raw)
  To: Greg KH
  Cc: Tim Bird, linux-embedded, linux kernel, Arnd Bergmann,
	Brian Swetland, Kay Sievers, Lennart Poettering
In-Reply-To: <1324514820.30527.148.camel@work-vm>

On Wed, 2011-12-21 at 16:47 -0800, john stultz wrote:
> Also, if I understand correctly, logger also is able to store messages
> from hardware modems or other hardware that run their own micro-os in a
> central location. This may be closer to the pstore mce bits, but its not
> clear yet to me how much overlap is there.

Looking at the driver, this point seems to be inaccurate as well.  So
sorry for adding to any confusion here. 

thanks
-john

^ permalink raw reply

* Re: RFC: android logger feedback request
From: NeilBrown @ 2011-12-22  1:20 UTC (permalink / raw)
  To: Tim Bird
  Cc: Greg KH, linux-embedded, linux kernel, Arnd Bergmann, john stultz,
	Brian Swetland, Kay Sievers, Lennart Poettering
In-Reply-To: <4EF27B85.9080801@am.sony.com>

[-- Attachment #1: Type: text/plain, Size: 1971 bytes --]

On Wed, 21 Dec 2011 16:36:21 -0800 Tim Bird <tim.bird@am.sony.com> wrote:

> On 12/21/2011 03:19 PM, Greg KH wrote:
> > That all describes the current code, but you haven't described what's
> > wrong with the existing syslog interface that requires this new driver
> > to be written.  And why can't the existing interface be fixed to address
> > these (potential) shortcomings?
> 
> 
> >> One specific question I have is where is the most appropriate
> >> place for this code to live, in the kernel source tree?
> >> Other embedded systems might want to use this system (it
> >> is simpler than syslog, and superior in some ways), so I don't
> >> think it should remain in an android-specific directory.
> > 
> > What way is it superior?
> 
> Here are some ways that this code is superior to syslog:

It is certainly nice and simple.  It really looks more like a filesystem than
a char device though...  though they aren't really files so much as lossy
pipes.  I don't think that's a problem though, lots of things in filesystems
don't behave exactly like files.

If you created a 'logbuf' filesystem that used libfs to provide a single
directory in which privileged processes could create files then you wouldn't
need the kernel to "know" the allowed logs: radio, events, main, system.
The size could be set by ftruncate() (by privileged used again) rather than
being hardcoded.

You would defined 'read' and 'write' much like you currently do to create a list of
datagrams in a circular buffer and replace the ioctls by more standard
interfaces:

LOGGER_GET_LOG_BUG_SIZE would use 'stat' and the st_blocks field
LOGGER_GET_LOG_LEN would use 'stat' and the st_size field
LOGGER_GET_NEXT_ENTRY_LEN could use the FIONREAD ioctl
LOGGER_FLUSH_LOG could use ftruncate

The result would be much the same amount of code, but an interface which has
fewer details hard-coded and is generally more versatile and accessible.

NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply

* Re: RFC: android logger feedback request
From: Tim Bird @ 2011-12-22  1:32 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-embedded, linux kernel, Arnd Bergmann, john stultz,
	Brian Swetland, Kay Sievers, Lennart Poettering
In-Reply-To: <20111222005114.GA22541@suse.de>

On 12/21/2011 04:51 PM, Greg KH wrote:
> On Wed, Dec 21, 2011 at 04:36:21PM -0800, Tim Bird wrote:
>> On 12/21/2011 03:19 PM, Greg KH wrote:

> Huh, I'm not talking about syslogd, I'm talking about the syslog(2)
> syscall we have.

OK - switching gears.  Since the kernel log buffer isn't normally
used to store use-space messages, I thought you were referring
to syslog(3) and the associated (logger(1) and syslogd(8)).

> This character interface seems very close to the syslog(2) api, but just
> done in a character interface, with ioctls, which also require userspace
> tools to manage properly, so I fail to see the big "gain" here.
> 
> What am I missing?

syslog(2) would more aptly be named klogctrl() (and it is in glibc)

There's currently no operation in sys_sylog (the kernel function
implementing syslog(2)) for writing to the log.  The write operation
to the kernel log buffer is also done via a character interface
/dev/kmsg (via code in drivers/char/mem.c)  This is actually very
similar to what the Android logger code does.

But while the kernel log buffer has lots of similarities to the Android logger
there are some key differences which I think are important to isolate
from a user-space logging system.

Here's a stream-of-consciousness dump of the differences:

The printk interface in the kernel is almost always automatically drained
to the device console, at the time of the printk (after the message is dropped
into the log buffer itself).  This extra operation is not needed for most
application-level messages that go into the log, and incurs extra overhead
in the log buffer code.

The printk code is especially designed to be called from within any kernel
context (including interrupt code), and so has some locking avoidance code
paths and complexity that are not needed for code which handles strictly
user-space messages.

Oddly enough, the printk code paths in the kernel can end up doing
a fair amount of print formatting, which can be time-consuming.  The code
path in kmsg_writev() contains at least one kmalloc, which could fail
when running out of memory.  The code path in the logger is much simpler,
consisting really of only a data copy.

Timestamping is not automatically appended to messages going into the
kernel log buffer (but they can be optionally pre-pended, with control
configurable at runtime).  They are represented
as ASCII text, which consumes a little more than twice the overhead of
a 32-bit binary field.  PID and TID are not automatically preserved in
the log. The kernel keeps it's priority in text also, and has no convention
for contextual tagging.  I'm not sure that we should change the
kernel log buffer to support structured binary data, in addition to the
free-form ASCII data that the kernel uses now.

The kernel log buffer does not support separate channels for different
classes of log messages (indeed, there is only one channel, and it has
kernel messages).  A new system call (or some backwards-compatible tweak
to the existing syslog(2) call would have to be added to support
a channel ID in order to support this.

There *are* some benefits to intermingling the kernel log messages and the
user-space log messages, but I think they are outweighed by the
value in keeping these systems separate.  There might be the opportunity
for code reuse, but I suspect we'd end up with about the same amount
of code increase overall (and possibly an additional syscall), and add
some unneeded complexity to the prink code path to accomplish it.

I just read Neil Brown's suggestion for doing this via a filesystem rather
than a char device, and it's interesting.  I'll respond to that separately.

 -- Tim

=============================
Tim Bird
Architecture Group Chair, CE Workgroup of the Linux Foundation
Senior Staff Engineer, Sony Network Entertainment
=============================

^ permalink raw reply

* Re: RFC: android logger feedback request
From: Greg KH @ 2011-12-22  1:47 UTC (permalink / raw)
  To: Tim Bird
  Cc: linux-embedded, linux kernel, Arnd Bergmann, john stultz,
	Brian Swetland, Kay Sievers, Lennart Poettering
In-Reply-To: <4EF288B4.1070601@am.sony.com>

On Wed, Dec 21, 2011 at 05:32:36PM -0800, Tim Bird wrote:
> On 12/21/2011 04:51 PM, Greg KH wrote:
> > On Wed, Dec 21, 2011 at 04:36:21PM -0800, Tim Bird wrote:
> >> On 12/21/2011 03:19 PM, Greg KH wrote:
> 
> > Huh, I'm not talking about syslogd, I'm talking about the syslog(2)
> > syscall we have.
> 
> OK - switching gears.  Since the kernel log buffer isn't normally
> used to store use-space messages, I thought you were referring
> to syslog(3) and the associated (logger(1) and syslogd(8)).

The kernel log buffer has been storing userspace messages for a while
now, look at your boot log on the latest Fedora and openSUSE releases
(or any other distro using systemd for booting).

> > This character interface seems very close to the syslog(2) api, but just
> > done in a character interface, with ioctls, which also require userspace
> > tools to manage properly, so I fail to see the big "gain" here.
> > 
> > What am I missing?
> 
> syslog(2) would more aptly be named klogctrl() (and it is in glibc)

Maybe, but this is a kernel mailing list, we don't worry about glibc
much here :)

> There's currently no operation in sys_sylog (the kernel function
> implementing syslog(2)) for writing to the log.  The write operation
> to the kernel log buffer is also done via a character interface
> /dev/kmsg (via code in drivers/char/mem.c)  This is actually very
> similar to what the Android logger code does.

Again, see above, this has been in the kernel for quite a while now...

> But while the kernel log buffer has lots of similarities to the Android logger
> there are some key differences which I think are important to isolate
> from a user-space logging system.
> 
> Here's a stream-of-consciousness dump of the differences:
> 
> The printk interface in the kernel is almost always automatically drained
> to the device console, at the time of the printk (after the message is dropped
> into the log buffer itself).  This extra operation is not needed for most
> application-level messages that go into the log, and incurs extra overhead
> in the log buffer code.
> 
> The printk code is especially designed to be called from within any kernel
> context (including interrupt code), and so has some locking avoidance code
> paths and complexity that are not needed for code which handles strictly
> user-space messages.
> 
> Oddly enough, the printk code paths in the kernel can end up doing
> a fair amount of print formatting, which can be time-consuming.  The code
> path in kmsg_writev() contains at least one kmalloc, which could fail
> when running out of memory.  The code path in the logger is much simpler,
> consisting really of only a data copy.
> 
> Timestamping is not automatically appended to messages going into the
> kernel log buffer (but they can be optionally pre-pended, with control
> configurable at runtime).  They are represented
> as ASCII text, which consumes a little more than twice the overhead of
> a 32-bit binary field.  PID and TID are not automatically preserved in
> the log. The kernel keeps it's priority in text also, and has no convention
> for contextual tagging.  I'm not sure that we should change the
> kernel log buffer to support structured binary data, in addition to the
> free-form ASCII data that the kernel uses now.
> 
> The kernel log buffer does not support separate channels for different
> classes of log messages (indeed, there is only one channel, and it has
> kernel messages).  A new system call (or some backwards-compatible tweak
> to the existing syslog(2) call would have to be added to support
> a channel ID in order to support this.
> 
> There *are* some benefits to intermingling the kernel log messages and the
> user-space log messages, but I think they are outweighed by the
> value in keeping these systems separate.  There might be the opportunity
> for code reuse, but I suspect we'd end up with about the same amount
> of code increase overall (and possibly an additional syscall), and add
> some unneeded complexity to the prink code path to accomplish it.

Again, please see what we are already doing in the kernel and userspace,
I think a lot of the above is already implemented.

Which brings me back to my original question, what does this code do,
that is not already present in the kernel, and why is a totally new
interface being proposed for it?

thanks,

greg k-h

^ permalink raw reply

* Re: RFC: android logger feedback request
From: Greg KH @ 2011-12-22  1:49 UTC (permalink / raw)
  To: NeilBrown
  Cc: Tim Bird, linux-embedded, linux kernel, Arnd Bergmann,
	john stultz, Brian Swetland, Kay Sievers, Lennart Poettering
In-Reply-To: <20111222122026.3a0fae36@notabene.brown>

On Thu, Dec 22, 2011 at 12:20:26PM +1100, NeilBrown wrote:
> On Wed, 21 Dec 2011 16:36:21 -0800 Tim Bird <tim.bird@am.sony.com> wrote:
> 
> > On 12/21/2011 03:19 PM, Greg KH wrote:
> > > That all describes the current code, but you haven't described what's
> > > wrong with the existing syslog interface that requires this new driver
> > > to be written.  And why can't the existing interface be fixed to address
> > > these (potential) shortcomings?
> > 
> > 
> > >> One specific question I have is where is the most appropriate
> > >> place for this code to live, in the kernel source tree?
> > >> Other embedded systems might want to use this system (it
> > >> is simpler than syslog, and superior in some ways), so I don't
> > >> think it should remain in an android-specific directory.
> > > 
> > > What way is it superior?
> > 
> > Here are some ways that this code is superior to syslog:
> 
> It is certainly nice and simple.  It really looks more like a filesystem than
> a char device though...  though they aren't really files so much as lossy
> pipes.  I don't think that's a problem though, lots of things in filesystems
> don't behave exactly like files.
> 
> If you created a 'logbuf' filesystem that used libfs to provide a single
> directory in which privileged processes could create files then you wouldn't
> need the kernel to "know" the allowed logs: radio, events, main, system.
> The size could be set by ftruncate() (by privileged used again) rather than
> being hardcoded.
> 
> You would defined 'read' and 'write' much like you currently do to create a list of
> datagrams in a circular buffer and replace the ioctls by more standard
> interfaces:
> 
> LOGGER_GET_LOG_BUG_SIZE would use 'stat' and the st_blocks field
> LOGGER_GET_LOG_LEN would use 'stat' and the st_size field
> LOGGER_GET_NEXT_ENTRY_LEN could use the FIONREAD ioctl
> LOGGER_FLUSH_LOG could use ftruncate
> 
> The result would be much the same amount of code, but an interface which has
> fewer details hard-coded and is generally more versatile and accessible.

But, almost all of this is already in the syslog system call today,
right?  So why create a new user api for something we have?

greg k-h

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox