* Re: [PATCH v2] printk: Improve memory usage logging during boot [not found] ` <ZvwZV5MDlQYGJv9N@pathway.suse.cz> @ 2024-12-01 11:40 ` Geert Uytterhoeven 2024-12-03 9:42 ` Petr Mladek 0 siblings, 1 reply; 3+ messages in thread From: Geert Uytterhoeven @ 2024-12-01 11:40 UTC (permalink / raw) To: Petr Mladek Cc: Isaac J. Manjarres, Steven Rostedt, John Ogness, Sergey Senozhatsky, surenb, kernel-team, linux-kernel, Linux Embedded Hi Petr et al, CC linux-embedded On Tue, Oct 1, 2024 at 5:55 PM Petr Mladek <pmladek@suse.com> wrote: > On Mon 2024-09-30 11:48:24, Isaac J. Manjarres wrote: > > When the initial printk ring buffer size is updated, setup_log_buf() > > allocates a new ring buffer, as well as a set of meta-data structures > > for the new ring buffer. The function also emits the new size of the > > ring buffer, but not the size of the meta-data structures. > > > > This makes it difficult to assess how changing the log buffer size > > impacts memory usage during boot. > > > > For instance, increasing the ring buffer size from 512 KB to 1 MB > > through the command line yields an increase of 2304 KB in reserved > > memory at boot, while the only obvious change is the 512 KB > > difference in the ring buffer sizes: > > > > log_buf_len=512K: > > > > printk: log_buf_len: 524288 bytes > > Memory: ... (... 733252K reserved ...) > > > > log_buf_len=1M: > > > > printk: log_buf_len: 1048576 bytes > > Memory: ... (... 735556K reserved ...) > > > > This is because of how the size of the meta-data structures scale with > > the size of the ring buffer. > > > > Even when there aren't changes to the printk ring buffer size (i.e. the > > initial size == 1 << CONFIG_LOG_BUF_SHIFT), it is impossible to tell > > how much memory is consumed by the printk ring buffer during boot. > > > > Therefore, unconditionally log the sizes of the printk ring buffer > > and its meta-data structures, so that it's easier to understand > > how changing the log buffer size (either through the command line or > > by changing CONFIG_LOG_BUF_SHIFT) affects boot time memory usage. > > > > With the new logs, it is much easier to see exactly why the memory > > increased by 2304 KB: > > > > log_buf_len=512K: > > > > printk: log_buf_len: 524288 bytes > > printk: prb_descs size: 393216 bytes > > printk: printk_infos size: 1441792 bytes > > This should get updated to the new format. > If I count correctly then it should be: > > printk: log buffer data + meta data: 524288 + 1835008 = 2359296 bytes > > > Memory: ... (... 733252K reserved ...) > > > > log_buf_len=1M: > > > > printk: log_buf_len: 1048576 bytes > > printk: prb_descs size: 786432 bytes > > printk: printk_infos size: 2883584 bytes > > and here: > > printk: log buffer data + meta data: 1048576 + 3670016 = 4718592 bytes Thanks, this is very useful information! Isn't this kernel log message bookkeeping becoming a bit too excessive? E.g. on a small system: printk: log buffer data + meta data: 65536 + 204800 = 270336 bytes Why is the meta data that big (> 3*log buffer)? #define PRB_AVGBITS 5 /* 32 character average length */ unsigned int descs_count = log_buf_len >> PRB_AVGBITS; meta_data_size = descs_count * (sizeof(struct prb_desc) + sizeof(struct printk_info)); struct prb_data_blk_lpos { unsigned long begin; unsigned long next; }; struct prb_desc { atomic_long_t state_var; struct prb_data_blk_lpos text_blk_lpos; }; i.e. 12 bytes on 32-bit, 24 bytes on 64-bit. #define PRINTK_INFO_SUBSYSTEM_LEN 16 #define PRINTK_INFO_DEVICE_LEN 48 struct dev_printk_info { char subsystem[PRINTK_INFO_SUBSYSTEM_LEN]; char device[PRINTK_INFO_DEVICE_LEN]; }; struct printk_info { u64 seq; /* sequence number */ u64 ts_nsec; /* timestamp in nanoseconds */ u16 text_len; /* length of text message */ u8 facility; /* syslog facility */ u8 flags:5; /* internal record flags */ u8 level:3; /* syslog level */ u32 caller_id; /* thread id or processor id */ struct dev_printk_info dev_info; }; for a whopping 88 bytes. So that totals to 100 bytes per entry on 32-bit, and 112 on 64-bit, for an average of 32 characters per printed message... Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v2] printk: Improve memory usage logging during boot 2024-12-01 11:40 ` [PATCH v2] printk: Improve memory usage logging during boot Geert Uytterhoeven @ 2024-12-03 9:42 ` Petr Mladek 2024-12-04 17:17 ` John Ogness 0 siblings, 1 reply; 3+ messages in thread From: Petr Mladek @ 2024-12-03 9:42 UTC (permalink / raw) To: Geert Uytterhoeven Cc: Isaac J. Manjarres, Steven Rostedt, John Ogness, Sergey Senozhatsky, surenb, kernel-team, linux-kernel, Linux Embedded On Sun 2024-12-01 12:40:13, Geert Uytterhoeven wrote: > Hi Petr et al, > > CC linux-embedded > > On Tue, Oct 1, 2024 at 5:55 PM Petr Mladek <pmladek@suse.com> wrote: > > On Mon 2024-09-30 11:48:24, Isaac J. Manjarres wrote: > > > When the initial printk ring buffer size is updated, setup_log_buf() > > > allocates a new ring buffer, as well as a set of meta-data structures > > > for the new ring buffer. The function also emits the new size of the > > > ring buffer, but not the size of the meta-data structures. > > > > > > This makes it difficult to assess how changing the log buffer size > > > impacts memory usage during boot. > > > > > > For instance, increasing the ring buffer size from 512 KB to 1 MB > > > through the command line yields an increase of 2304 KB in reserved > > > memory at boot, while the only obvious change is the 512 KB > > > difference in the ring buffer sizes: > > > > > > log_buf_len=512K: > > > > > > printk: log_buf_len: 524288 bytes > > > Memory: ... (... 733252K reserved ...) > > > > > > log_buf_len=1M: > > > > > > printk: log_buf_len: 1048576 bytes > > > Memory: ... (... 735556K reserved ...) > > > > > > This is because of how the size of the meta-data structures scale with > > > the size of the ring buffer. > > > > > > Even when there aren't changes to the printk ring buffer size (i.e. the > > > initial size == 1 << CONFIG_LOG_BUF_SHIFT), it is impossible to tell > > > how much memory is consumed by the printk ring buffer during boot. > > > > > > Therefore, unconditionally log the sizes of the printk ring buffer > > > and its meta-data structures, so that it's easier to understand > > > how changing the log buffer size (either through the command line or > > > by changing CONFIG_LOG_BUF_SHIFT) affects boot time memory usage. > > > > > > With the new logs, it is much easier to see exactly why the memory > > > increased by 2304 KB: > > > > > > log_buf_len=512K: > > > > printk: log buffer data + meta data: 524288 + 1835008 = 2359296 bytes > > > > > log_buf_len=1M: > > > > printk: log buffer data + meta data: 1048576 + 3670016 = 4718592 bytes > > Thanks, this is very useful information! > > Isn't this kernel log message bookkeeping becoming a bit too excessive? > E.g. on a small system: > > printk: log buffer data + meta data: 65536 + 204800 = 270336 bytes > > Why is the meta data that big (> 3*log buffer)? > > #define PRB_AVGBITS 5 /* 32 character average length */ > > unsigned int descs_count = log_buf_len >> PRB_AVGBITS; > meta_data_size = descs_count * (sizeof(struct prb_desc) + > sizeof(struct printk_info)); > > struct prb_data_blk_lpos { > unsigned long begin; > unsigned long next; > }; > > struct prb_desc { > atomic_long_t state_var; > struct prb_data_blk_lpos text_blk_lpos; > }; I am afraid that we could not do much about the size of this part. All the variables are important parts of the lockless machinery. > i.e. 12 bytes on 32-bit, 24 bytes on 64-bit. > > #define PRINTK_INFO_SUBSYSTEM_LEN 16 > #define PRINTK_INFO_DEVICE_LEN 48 > > struct dev_printk_info { > char subsystem[PRINTK_INFO_SUBSYSTEM_LEN]; > char device[PRINTK_INFO_DEVICE_LEN]; > }; This is probably the lowest hanging fruit. It should be possible to write these dev_printk-specific metadata into the data buffer a more efficient way and only for records created by dev_printk(). It would require adding "dict_len" into "struct printk_info" and reserving space for both "text_len" and "dict_len". We bundled it into the metadata because these are metadata. We wanted to keep the design as clean as possible. We focused mainly on the stability and maintainability of the code. And it was really challenging to get it all working. > struct printk_info { > u64 seq; /* sequence number */ I do not recal the details. But I think that we need to explicitely store the 64-bit "seq" number in the metadata because of the lockless algoritm. It helps to solve problems with wrapping of the counter in "atomic_long_t state_var". It was not stored before. The following global values were enough when the log buffer was synchronized by "logbuf_lock" spin lock: static u64 log_first_seq; static u64 log_next_seq; > u64 ts_nsec; /* timestamp in nanoseconds */ > u16 text_len; /* length of text message */ > u8 facility; /* syslog facility */ > u8 flags:5; /* internal record flags */ > u8 level:3; /* syslog level */ > u32 caller_id; /* thread id or processor id */ These metadata used to be stored in the "data" buffer next to the message. Here is the declaration from v6.9: struct printk_log { u64 ts_nsec; /* timestamp in nanoseconds */ u16 len; /* length of entire record */ u16 text_len; /* length of text buffer */ u16 dict_len; /* length of dictionary buffer */ u8 facility; /* syslog facility */ u8 flags:5; /* internal record flags */ u8 level:3; /* syslog level */ #ifdef CONFIG_PRINTK_CALLER u32 caller_id; /* thread id or processor id */ #endif } > struct dev_printk_info dev_info; As I wrote above. It should be possible to store these metadata more effectively in the data buffer. But note that it is only about the internal kernel code. These change would require also updating external tools, for example "crash" tool. > }; > > for a whopping 88 bytes. So that totals to 100 bytes per entry > on 32-bit, and 112 on 64-bit, for an average of 32 characters per > printed message... It would be interesting to know how much if these are wasted because either struct dev_printk_info is empty or the entries are shorter. Best Regards, Petr ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v2] printk: Improve memory usage logging during boot 2024-12-03 9:42 ` Petr Mladek @ 2024-12-04 17:17 ` John Ogness 0 siblings, 0 replies; 3+ messages in thread From: John Ogness @ 2024-12-04 17:17 UTC (permalink / raw) To: Petr Mladek, Geert Uytterhoeven Cc: Isaac J. Manjarres, Steven Rostedt, Sergey Senozhatsky, surenb, kernel-team, linux-kernel, Linux Embedded On 2024-12-03, Petr Mladek <pmladek@suse.com> wrote: > On Sun 2024-12-01 12:40:13, Geert Uytterhoeven wrote: >> Isn't this kernel log message bookkeeping becoming a bit too excessive? >> E.g. on a small system: >> >> printk: log buffer data + meta data: 65536 + 204800 = 270336 bytes >> >> Why is the meta data that big (> 3*log buffer)? >> >> #define PRB_AVGBITS 5 /* 32 character average length */ >> >> unsigned int descs_count = log_buf_len >> PRB_AVGBITS; >> meta_data_size = descs_count * (sizeof(struct prb_desc) + >> sizeof(struct printk_info)); >> >> struct prb_data_blk_lpos { >> unsigned long begin; >> unsigned long next; >> }; >> >> struct prb_desc { >> atomic_long_t state_var; >> struct prb_data_blk_lpos text_blk_lpos; >> }; >> >> i.e. 12 bytes on 32-bit, 24 bytes on 64-bit. > > I am afraid that we could not do much about the size of this part. > All the variables are important parts of the lockless machinery. The descriptor array is one source of wasted space. It ensures there are enough descriptors so that the text_ring can be fully maximized for an average record text length of 32. However, looking at /sys/kernel/debug/printk/index/vmlinux and running some basic tests, it seems the average text length is >=45 (usually around 55). That means at least 30% of the descriptor space is not in use, which is roughtly 5% of the total memory used by all ringbuffer components. For example, for CONFIG_LOG_BUF_SHIFT=13, the amount of wasted desc_ring space is about 1.8KiB. (The total memory usage of the ringbuffer is 36KiB.) However, if we bump the expected average size to 64, there will not be enough descriptors, leading to wasted text_ring buffer space. (The expected size must be a power of 2 due to the ID wrapping algorithm.) >> #define PRINTK_INFO_SUBSYSTEM_LEN 16 >> #define PRINTK_INFO_DEVICE_LEN 48 >> >> struct dev_printk_info { >> char subsystem[PRINTK_INFO_SUBSYSTEM_LEN]; >> char device[PRINTK_INFO_DEVICE_LEN]; >> }; > > This is probably the lowest hanging fruit. Yes, the info array is also a source of wasted space. This was known at the time when we discussed [0] introducing this array. Many records do not even use the dev_printk_info fields. (A running system seems to use them more often than a booting system.) However, the 64 bytes is currently quite large. During some testing I typically saw 70% of the dev_printk_info text space for valid descriptors unused. I typically saw dev_printk_info averages of: - 5 bytes for SUBSYSTEM - 12 bytes for DEVICE However, this problem is compounded by the descriptor array issue I mentioned at the beginning. The reason is that there is a 1:1 relationship between descriptors and dev_printk_info structs. So if 30% of the descriptors are invalid, then that adds an additional waste of 30% totally unused dev_printk_info structs. For example, for CONFIG_LOG_BUF_SHIFT=13, that represents a total of 13KiB wasted space in the info array (36% of the total memory usage). > It should be possible to write these dev_printk-specific metadata into > the data buffer a more efficient way and only for records created by > dev_printk(). > > It would require adding "dict_len" into "struct printk_info" > and reserving space for both "text_len" and "dict_len". > > We bundled it into the metadata because these are metadata. > We wanted to keep the design as clean as possible. We focused > mainly on the stability and maintainability of the code. > And it was really challenging to get it all working. I think keeping it bundled in the meta data is correct. But if those text arrays could be allocated from a text ring, then the space could be used efficiently. I am not recommending that we add the dict_ring back. It was considerably more complex. But perhaps we could use the text_ring for these allocations as well. It may even have the benefit that the "average text size" becomes >=64, which would also help with the first wasted item I mentioned. >> struct printk_info { >> u64 seq; /* sequence number */ > > I do not recal the details. But I think that we need to > explicitely store the 64-bit "seq" number in the metadata > because of the lockless algoritm. It helps to solve > problems with wrapping of the counter in > "atomic_long_t state_var". Yes. I could not figure out a way to safely update a @log_first_seq to represent the first sequence available in the ringbuffer. (The complexity involves both writers and readers seeing appropriate sequence values.) If that could be done somehow, that would save another 2KiB (for CONFIG_LOG_BUF_SHIFT=13). In summary... I think the only quick fix we could do immediately is reduce PRINTK_INFO_DEVICE_LEN. On my various test machines, I never encountered anything above 25. Perhaps reducing it from 48 to 32? That would immediately reclaim 11% (4KiB for CONFIG_LOG_BUF_SHIFT=13) and it would not require any changes to the various crash/dump tools. In the longer term we may want to consider using the text ring for additional device/subsystem string allocation. This would only require changes to crash/dump tools that provide the device/subsystem information. In addition, if low-memory single CPU systems are am important target, we might be better off implementing a CONFIG_PRINTK_RINGBUFFER_TINY. Implementing a lockless ringbuffer for a single CPU is trivial (in comparison) and would not require all the management abstractions. If it used the same printk_ringbuffer API it could be a simple drop-in replacement. John Ogness [0] https://lore.kernel.org/lkml/20200904124530.GB20558@alley ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-12-04 17:17 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20240930184826.3595221-1-isaacmanjarres@google.com>
[not found] ` <ZvwZV5MDlQYGJv9N@pathway.suse.cz>
2024-12-01 11:40 ` [PATCH v2] printk: Improve memory usage logging during boot Geert Uytterhoeven
2024-12-03 9:42 ` Petr Mladek
2024-12-04 17:17 ` John Ogness
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox