* Document POSIX MQ /proc/sys/fs/mqueue files
@ 2014-09-29 9:10 Michael Kerrisk (man-pages)
2014-09-29 17:28 ` Doug Ledford
2014-09-29 20:23 ` Davidlohr Bueso
0 siblings, 2 replies; 10+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-09-29 9:10 UTC (permalink / raw)
To: Davidlohr Bueso, Doug Ledford
Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, lkml,
Madars Vitolins
Hello Doug, David,
I think you two were the last ones to make significant
changes to the semantics of the files in /proc/sys/fs/mqueue,
so I wonder if you (or anyone else who is willing) might
take a look at the man page text below that I've written
(for the mq_overview(7) page) to describe past and current
reality, and let me know of improvements of corrections.
By the way, Doug, your commit ce2d52cc1364 appears to have
changed/broken the semantics of the files in the /dev/mqueue
filesystem. Formerly, the QSIZE field in these files showed
the number of bytes of real user data in all of the queued
messages. After that commit, QSIZE now includes kernel
overhead bytes, which does not seem very useful for user
space. Was that change intentional? I see no mention of the
change in the commit message, so it sounds like it was not
intended.
Cheers,
Michael
From mq_overview(7) draft:
/proc interfaces
The following interfaces can be used to limit the amount of ker‐
nel memory consumed by POSIX message queues and to set the
default attributes for new message queues:
/proc/sys/fs/mqueue/msg_default (since Linux 3.5)
This file defines the value used for a new queue's
mq_maxmsg setting when the queue is created with a call to
mq_open(3) where attr is specified as NULL. The default
value for this file is 10. The minimum and maximum are as
for /proc/sys/fs/mqueue/msg_max. If msg_default exceeds
msg_max, a new queue's default mq_maxmsg value is capped
to the msg_max limit. Up until Linux 2.6.28, the default
mq_maxmsg was 10; from Linux 2.6.28 to Linux 3.4, the
default was the value defined for the msg_max limit.
/proc/sys/fs/mqueue/msg_max
This file can be used to view and change the ceiling value
for the maximum number of messages in a queue. This value
acts as a ceiling on the attr->mq_maxmsg argument given to
mq_open(3). The default value for msg_max is 10. The
minimum value is 1 (10 in kernels before 2.6.28). The
upper limit is HARD_MSGMAX. The msg_max limit is ignored
for privileged processes (CAP_SYS_RESOURCE), but the
HARD_MSGMAX ceiling is nevertheless imposed.
The definition of HARD_MSGMAX has changed across kernel
versions:
* Up to Linux 2.6.32: 131072 / sizeof(void *)
* Linux 2.6.33 to 3.4: (32768 * sizeof(void *) / 4)
* Since Linux 3.5: 65,536
/proc/sys/fs/mqueue/msgsize_default (since Linux 3.5)
This file defines the value used for a new queue's mq_msg‐
size setting when the queue is created with a call to
mq_open(3) where attr is specified as NULL. The default
value for this file is 8192. The minimum and maximum are
as for /proc/sys/fs/mqueue/msgsize_max. If msg‐
size_default exceeds msgsize_max, a new queue's default
mq_msgsize value is capped to the msgsize_max limit. Up
until Linux 2.6.28, the default mq_msgsize was 8192; from
Linux 2.6.28 to Linux 3.4, the default was the value
defined for the msgsize_max limit.
/proc/sys/fs/mqueue/msgsize_max
This file can be used to view and change the ceiling on
the maximum message size. This value acts as a ceiling on
the attr->mq_msgsize argument given to mq_open(3). The
default value for msgsize_max is 8192 bytes. The minimum
value is 128 (8192 in kernels before 2.6.28). The upper
limit for msgsize_max has varied across kernel versions:
* Before Linux 2.6.28, the upper limit is INT_MAX.
* From Linux 2.6.28 to 3.4, the limit is 1,048,576.
* Since Linux 3.5, the limit is 16,777,216 (HARD_MSGSIZE‐
MAX).
The msgsize_max limit is ignored for privileged process
(CAP_SYS_RESOURCE), but, since Linux 3.5, the HARD_MSG‐
SIZEMAX ceiling is enforced for privileged processes.
/proc/sys/fs/mqueue/queues_max
This file can be used to view and change the system-wide
limit on the number of message queues that can be created.
The default value for queues_max is 256. The semantics of
this limit have changed across kernel versions as follows:
* Before Linux 3.5, this limit could be changed to any
value in the range 0 to INT_MAX, but privileged pro‐
cesses (CAP_SYS_RESOURCE) can exceed the limit.
* Since Linux 3.5, there is a ceiling for this limit of
1024 (HARD_QUEUESMAX). Privileged processes
(CAP_SYS_RESOURCE) can exceed the queues_max limit, but
the HARD_QUEUESMAX limit is enforced even for privi‐
leged processes.
* Starting with Linux 3.14, the HARD_QUEUESMAX ceiling is
removed: no ceiling is imposed on the queues_max limit,
and privileged processes (CAP_SYS_RESOURCE) can exceed
the limit.
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Document POSIX MQ /proc/sys/fs/mqueue files
2014-09-29 9:10 Document POSIX MQ /proc/sys/fs/mqueue files Michael Kerrisk (man-pages)
@ 2014-09-29 17:28 ` Doug Ledford
[not found] ` <1412011687.15492.39.camel-v+aXH1h/sVwpzh8Nc7Vzg+562jBIR2Zt@public.gmane.org>
2014-09-29 20:23 ` Davidlohr Bueso
1 sibling, 1 reply; 10+ messages in thread
From: Doug Ledford @ 2014-09-29 17:28 UTC (permalink / raw)
To: Michael Kerrisk (man-pages)
Cc: Davidlohr Bueso, linux-man@vger.kernel.org, lkml, Madars Vitolins
[-- Attachment #1: Type: text/plain, Size: 11412 bytes --]
On Mon, 2014-09-29 at 11:10 +0200, Michael Kerrisk (man-pages) wrote:
> Hello Doug, David,
>
> I think you two were the last ones to make significant
> changes to the semantics of the files in /proc/sys/fs/mqueue,
> so I wonder if you (or anyone else who is willing) might
> take a look at the man page text below that I've written
> (for the mq_overview(7) page) to describe past and current
> reality, and let me know of improvements of corrections.
>
> By the way, Doug, your commit ce2d52cc1364 appears to have
> changed/broken the semantics of the files in the /dev/mqueue
> filesystem. Formerly, the QSIZE field in these files showed
> the number of bytes of real user data in all of the queued
> messages. After that commit, QSIZE now includes kernel
> overhead bytes, which does not seem very useful for user
> space. Was that change intentional? I see no mention of the
> change in the commit message, so it sounds like it was not
> intended.
That change didn't come in that commit. That commit modified it, but
didn't introduce it.
Now, was it intentional? Yes. Is it valuable, useful? That depends on
your perspective.
One of the problems I ran into with that code relates to the rlimit
checks that happen at queue creation time. We used to check to see if
msg_num * (msg_size + sizeof struct msg_msg *)
would fit within the user's currently available rlimit for
RLIMIT_MSGQUEUE. This was not an accurate check though. It accounted
for the msg number, and the payload size, and the array of pointers we
used to point to the msg_msg structs that held each message, but ignored
the msg_msg structs themselves. Given that we accept the creation of
message queues with a msg_size of 1, this could be used to create a
minor DoS because of the fact that there was such a large size
difference between the sizeof struct msg_msg and the size of our
messages. In this scenario, a msg_size of 1 would result in us
accounting 9/5 bytes per message on 64bit/32bit OSes respecitively, but
actually using 49bytes/19bytes respectively. That's a 4:1 ratio at the
worst case for the different between actual memory used and memory usage
accounted against the RLIMIT_MSGQUEUE limit. So before I ever got around
to doing the rbtree update, I fixed this to at least be more accurate
and it became
msg_num * (msg_size + sizeof struct msg_msg * + sizeof struct msg_msg)
Even this wasn't totally accurate though, as large messages could result
in the allocation of additional msg_msgseg segments. However, I ignored
that inaccuracy because once the message size is large enough to need
additional SG segments, we are no longer in danger of any sort of minor
DoS because our own overhead will become nothing more than noise to the
calculation.
When I then changed things to use rbtrees, I again updated the way we
calculate memory consumed by a queue. The rbtrees are used one per
priority with a list head attached to our rbtree node so that once we
locate our given priority, we have O(1) insertion and removal of
messages. It just so happens that, sometime long ago, someone set our
maximum number of priorities we support in Linux at 32768. This kills
us on our memory calculations because the size of the msg_tree_node
struct is another 40 bytes on 64bit. That means if someone creates a
message queue with 32768 max_msgs, and a msg_size of 1, they can cause
us to allocate 32768 struct msg_msg, 32768 struct posix_msg_tree_node,
and 32768 * 1 payload. In order to protect against that sort of
exploitation, the new memory usage calculation had to become:
msg_num * (msg_size + sizeof struct msg_msg) +
sizeof struct posix_msg_tree_node * min(msg_num, max_priorities)
So, that's how we now calculate the size of a queue when checking it
against RLIMIT_MSGQUEUE to see if the user has the ability to create a
new queue. This is now reasonably accurate, and it closes up what would
have been a minimum of an order of magnitude error between the worst
case scenario's actual memory usage and accounted memory usage.
With this change in place, people that used to be able to allocate lots
of large queues of very small messages suddenly needed to adjust their
RLIMIT_MSGQUEUE to be able to continue. I contend this is the right
thing, but it is a surprise to some people. At the time, I had thought
that the sizeof struct msg_msg was already accounted for in the QSIZE
output. So I had added the rbtree size in too so that users could see
their currently used memory more accurately. Going back and looking
now, that was a mistake on my part as the size of struct msg_msg is not
included in that number, so it wasn't correct to add the rbtree size
their either (or at a minimum if I was going to add one, I should have
added both, but this in-between land makes no sense). However, I think
it's probably worth adding a new field to the end of that data output
that does reflect both struct msg_msg and struct posix_msg_tree_node
allocations so that users can see the overhead of their current queue
usage, especially in light of the changes to how the rlimit is enforced.
And I would say that putting the data element back to an exact match to
the number of user data bytes currently in queue makes sense.
I've been trying to think of a way to tackle the priorities problem
anyway. That we have a default, and unchangeable, setting of 32768
priorities precludes having lots of small messages in queue without
having to plan for huge amounts of overhead. I think it's worth
investigating some method of allowing the supported number of priorities
for queues (either system wide or per namespace or per queue) to be
reduced in the name of efficiency. I can bump that work up my priority
list and take care of fixing up the DATA field at the same time.
The man page below looks fine to me. It covers the various
incarnations. If I add some tweaks to the priorities value though, it
will need updating again ;-)
Although this section wasn't included below, I would update how the
memory is calculated to match what I wrote above. However, I would also
put in a notation that the calculation can change when the kernel's
internal implementation changes and resource usage therefore changes.
> Cheers,
>
> Michael
>
> From mq_overview(7) draft:
>
> /proc interfaces
> The following interfaces can be used to limit the amount of ker‐
> nel memory consumed by POSIX message queues and to set the
> default attributes for new message queues:
>
> /proc/sys/fs/mqueue/msg_default (since Linux 3.5)
> This file defines the value used for a new queue's
> mq_maxmsg setting when the queue is created with a call to
> mq_open(3) where attr is specified as NULL. The default
> value for this file is 10. The minimum and maximum are as
> for /proc/sys/fs/mqueue/msg_max. If msg_default exceeds
> msg_max, a new queue's default mq_maxmsg value is capped
> to the msg_max limit. Up until Linux 2.6.28, the default
> mq_maxmsg was 10; from Linux 2.6.28 to Linux 3.4, the
> default was the value defined for the msg_max limit.
>
> /proc/sys/fs/mqueue/msg_max
> This file can be used to view and change the ceiling value
> for the maximum number of messages in a queue. This value
> acts as a ceiling on the attr->mq_maxmsg argument given to
> mq_open(3). The default value for msg_max is 10. The
> minimum value is 1 (10 in kernels before 2.6.28). The
> upper limit is HARD_MSGMAX. The msg_max limit is ignored
> for privileged processes (CAP_SYS_RESOURCE), but the
> HARD_MSGMAX ceiling is nevertheless imposed.
>
> The definition of HARD_MSGMAX has changed across kernel
> versions:
>
> * Up to Linux 2.6.32: 131072 / sizeof(void *)
>
> * Linux 2.6.33 to 3.4: (32768 * sizeof(void *) / 4)
>
> * Since Linux 3.5: 65,536
>
> /proc/sys/fs/mqueue/msgsize_default (since Linux 3.5)
> This file defines the value used for a new queue's mq_msg‐
> size setting when the queue is created with a call to
> mq_open(3) where attr is specified as NULL. The default
> value for this file is 8192. The minimum and maximum are
> as for /proc/sys/fs/mqueue/msgsize_max. If msg‐
> size_default exceeds msgsize_max, a new queue's default
> mq_msgsize value is capped to the msgsize_max limit. Up
> until Linux 2.6.28, the default mq_msgsize was 8192; from
> Linux 2.6.28 to Linux 3.4, the default was the value
> defined for the msgsize_max limit.
>
> /proc/sys/fs/mqueue/msgsize_max
> This file can be used to view and change the ceiling on
> the maximum message size. This value acts as a ceiling on
> the attr->mq_msgsize argument given to mq_open(3). The
> default value for msgsize_max is 8192 bytes. The minimum
> value is 128 (8192 in kernels before 2.6.28). The upper
> limit for msgsize_max has varied across kernel versions:
>
> * Before Linux 2.6.28, the upper limit is INT_MAX.
>
> * From Linux 2.6.28 to 3.4, the limit is 1,048,576.
>
> * Since Linux 3.5, the limit is 16,777,216 (HARD_MSGSIZE‐
> MAX).
>
> The msgsize_max limit is ignored for privileged process
> (CAP_SYS_RESOURCE), but, since Linux 3.5, the HARD_MSG‐
> SIZEMAX ceiling is enforced for privileged processes.
>
> /proc/sys/fs/mqueue/queues_max
> This file can be used to view and change the system-wide
> limit on the number of message queues that can be created.
> The default value for queues_max is 256. The semantics of
> this limit have changed across kernel versions as follows:
>
> * Before Linux 3.5, this limit could be changed to any
> value in the range 0 to INT_MAX, but privileged pro‐
> cesses (CAP_SYS_RESOURCE) can exceed the limit.
>
> * Since Linux 3.5, there is a ceiling for this limit of
> 1024 (HARD_QUEUESMAX). Privileged processes
> (CAP_SYS_RESOURCE) can exceed the queues_max limit, but
> the HARD_QUEUESMAX limit is enforced even for privi‐
> leged processes.
>
> * Starting with Linux 3.14, the HARD_QUEUESMAX ceiling is
> removed: no ceiling is imposed on the queues_max limit,
> and privileged processes (CAP_SYS_RESOURCE) can exceed
> the limit.
>
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: 0E572FDD
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Document POSIX MQ /proc/sys/fs/mqueue files
2014-09-29 9:10 Document POSIX MQ /proc/sys/fs/mqueue files Michael Kerrisk (man-pages)
2014-09-29 17:28 ` Doug Ledford
@ 2014-09-29 20:23 ` Davidlohr Bueso
[not found] ` <1412022198.23497.23.camel-dxKd5G12XOI1EaDjlw0dpg@public.gmane.org>
1 sibling, 1 reply; 10+ messages in thread
From: Davidlohr Bueso @ 2014-09-29 20:23 UTC (permalink / raw)
To: Michael Kerrisk (man-pages)
Cc: Doug Ledford, linux-man@vger.kernel.org, lkml, Madars Vitolins,
Manfred Spraul
Hi Michael,
Cc'ing Manfred.
On Mon, 2014-09-29 at 11:10 +0200, Michael Kerrisk (man-pages) wrote:
> Hello Doug, David,
>
> I think you two were the last ones to make significant
> changes to the semantics of the files in /proc/sys/fs/mqueue,
> so I wonder if you (or anyone else who is willing) might
> take a look at the man page text below that I've written
> (for the mq_overview(7) page) to describe past and current
> reality, and let me know of improvements of corrections.
Over the years posix mqueues have increasingly become a mess *sigh* :-/
Thanks for doing this and untangling some of the historic changes.
> From mq_overview(7) draft:
>
> /proc interfaces
> The following interfaces can be used to limit the amount of ker‐
> nel memory consumed by POSIX message queues and to set the
> default attributes for new message queues:
>
> /proc/sys/fs/mqueue/msg_default (since Linux 3.5)
> This file defines the value used for a new queue's
> mq_maxmsg setting when the queue is created with a call to
> mq_open(3) where attr is specified as NULL. The default
> value for this file is 10. The minimum and maximum are as
> for /proc/sys/fs/mqueue/msg_max. If msg_default exceeds
> msg_max, a new queue's default mq_maxmsg value is capped
> to the msg_max limit.
I think rephrasing this would read easier. Basically the behavior is
this:
info->attr.mq_maxmsg = min(ipc_ns->mq_msg_max,
ipc_ns->mq_msg_default);
Something like:
"a new queue's default mq_maxmsg value will be the smallest of msg_default and msg_max"
> Up until Linux 2.6.28, the default
> mq_maxmsg was 10; from Linux 2.6.28 to Linux 3.4, the
> default was the value defined for the msg_max limit.
>
> /proc/sys/fs/mqueue/msg_max
> This file can be used to view and change the ceiling value
> for the maximum number of messages in a queue. This value
> acts as a ceiling on the attr->mq_maxmsg argument given to
> mq_open(3). The default value for msg_max is 10. The
> minimum value is 1 (10 in kernels before 2.6.28). The
> upper limit is HARD_MSGMAX. The msg_max limit is ignored
> for privileged processes (CAP_SYS_RESOURCE), but the
> HARD_MSGMAX ceiling is nevertheless imposed.
Note that the HARD_MSGMAX check is done *only* for privileged processes,
regular processes only check against namespace values. This is a pretty
fundamental difference. The same goes of course for msgsize:
if (capable(CAP_SYS_RESOURCE)) {
if (attr->mq_maxmsg > HARD_MSGMAX ||
attr->mq_msgsize > HARD_MSGSIZEMAX)
return -EINVAL;
} else {
if (attr->mq_maxmsg > ipc_ns->mq_msg_max ||
attr->mq_msgsize > ipc_ns->mq_msgsize_max)
return -EINVAL;
}
> The definition of HARD_MSGMAX has changed across kernel
> versions:
>
> * Up to Linux 2.6.32: 131072 / sizeof(void *)
>
> * Linux 2.6.33 to 3.4: (32768 * sizeof(void *) / 4)
>
> * Since Linux 3.5: 65,536
>
> /proc/sys/fs/mqueue/msgsize_default (since Linux 3.5)
You might want to mention the units (bytes) when refering to limits.
> This file defines the value used for a new queue's mq_msg‐
> size setting when the queue is created with a call to
> mq_open(3) where attr is specified as NULL. The default
> value for this file is 8192. The minimum and maximum are
> as for /proc/sys/fs/mqueue/msgsize_max. If msg‐
> size_default exceeds msgsize_max, a new queue's default
> mq_msgsize value is capped to the msgsize_max limit. Up
> until Linux 2.6.28, the default mq_msgsize was 8192; from
> Linux 2.6.28 to Linux 3.4, the default was the value
> defined for the msgsize_max limit.
>
> /proc/sys/fs/mqueue/msgsize_max
Ditto here.
> This file can be used to view and change the ceiling on
> the maximum message size. This value acts as a ceiling on
> the attr->mq_msgsize argument given to mq_open(3). The
> default value for msgsize_max is 8192 bytes. The minimum
> value is 128 (8192 in kernels before 2.6.28). The upper
> limit for msgsize_max has varied across kernel versions:
>
> * Before Linux 2.6.28, the upper limit is INT_MAX.
>
> * From Linux 2.6.28 to 3.4, the limit is 1,048,576.
>
> * Since Linux 3.5, the limit is 16,777,216 (HARD_MSGSIZE‐
> MAX).
> The msgsize_max limit is ignored for privileged process
> (CAP_SYS_RESOURCE), but, since Linux 3.5, the HARD_MSG‐
> SIZEMAX ceiling is enforced for privileged processes.
>
> /proc/sys/fs/mqueue/queues_max
> This file can be used to view and change the system-wide
> limit on the number of message queues that can be created.
> The default value for queues_max is 256. The semantics of
> this limit have changed across kernel versions as follows:
>
> * Before Linux 3.5, this limit could be changed to any
> value in the range 0 to INT_MAX, but privileged pro‐
> cesses (CAP_SYS_RESOURCE) can exceed the limit.
>
> * Since Linux 3.5, there is a ceiling for this limit of
> 1024 (HARD_QUEUESMAX). Privileged processes
> (CAP_SYS_RESOURCE) can exceed the queues_max limit, but
> the HARD_QUEUESMAX limit is enforced even for privi‐
> leged processes.
>
> * Starting with Linux 3.14, the HARD_QUEUESMAX ceiling is
> removed: no ceiling is imposed on the queues_max limit,
> and privileged processes (CAP_SYS_RESOURCE) can exceed
> the limit.
Given that this was treated as a bug that breaks user-space, I don't
think we really want to document the behavior between 3.5 and 3.14 (all
three bullets). Stable kernels back to 3.5 now have the default
behavior, so its as nothing ever changed(?). Now, if you explicitly want
to document such bug, I would agree, but just not mentioning it as
intentional differences in behavior. Does that make sense?
Thanks,
Davidlohr
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Document POSIX MQ /proc/sys/fs/mqueue files
[not found] ` <1412011687.15492.39.camel-v+aXH1h/sVwpzh8Nc7Vzg+562jBIR2Zt@public.gmane.org>
@ 2014-09-30 10:12 ` Michael Kerrisk (man-pages)
2014-09-30 17:30 ` Davidlohr Bueso
[not found] ` <CAKgNAkj9+Z1yO-zrQ_qWFut7BqOkzNtNCruSSRyTKMpXZOcBaw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-01 10:02 ` Michael Kerrisk (man-pages)
1 sibling, 2 replies; 10+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-09-30 10:12 UTC (permalink / raw)
To: Doug Ledford
Cc: Davidlohr Bueso,
linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, lkml,
Madars Vitolins
Hi Doug,
On Mon, Sep 29, 2014 at 7:28 PM, Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> On Mon, 2014-09-29 at 11:10 +0200, Michael Kerrisk (man-pages) wrote:
>> Hello Doug, David,
>>
>> I think you two were the last ones to make significant
>> changes to the semantics of the files in /proc/sys/fs/mqueue,
>> so I wonder if you (or anyone else who is willing) might
>> take a look at the man page text below that I've written
>> (for the mq_overview(7) page) to describe past and current
>> reality, and let me know of improvements of corrections.
>>
>> By the way, Doug, your commit ce2d52cc1364 appears to have
>> changed/broken the semantics of the files in the /dev/mqueue
>> filesystem. Formerly, the QSIZE field in these files showed
>> the number of bytes of real user data in all of the queued
>> messages. After that commit, QSIZE now includes kernel
>> overhead bytes, which does not seem very useful for user
>> space. Was that change intentional? I see no mention of the
>> change in the commit message, so it sounds like it was not
>> intended.
>
> That change didn't come in that commit. That commit modified it, but
> didn't introduce it.
(Which commit was it then? d6629859b36 ?)
> Now, was it intentional? Yes. Is it valuable, useful? That depends on
> your perspective.
Thanks for the detailed explanation below. However, I don't understand
why the (useful) work that you describe below necessitated a change in
the QSIZE value that was exposed to user space. Surely the necessary
changes could have been done internally while still leaving QSIZE to
expose the same value it ever did? As things stand now (and unless I
am missing something), QSIZE exposes an implementation-specific
internal value that has little meaning or value to user space. And,
it's unfortunate that the commit message made no mention of the fact
that there was an ABI change here.
[...]
> The man page below looks fine to me.
Thanks for checking it!
Cheers,
Michael
> It covers the various
> incarnations. If I add some tweaks to the priorities value though, it
> will need updating again ;-)
>
> Although this section wasn't included below, I would update how the
> memory is calculated to match what I wrote above. However, I would also
> put in a notation that the calculation can change when the kernel's
> internal implementation changes and resource usage therefore changes.
>
>> Cheers,
>>
>> Michael
>>
>> From mq_overview(7) draft:
>>
>> /proc interfaces
>> The following interfaces can be used to limit the amount of ker‐
>> nel memory consumed by POSIX message queues and to set the
>> default attributes for new message queues:
>>
>> /proc/sys/fs/mqueue/msg_default (since Linux 3.5)
>> This file defines the value used for a new queue's
>> mq_maxmsg setting when the queue is created with a call to
>> mq_open(3) where attr is specified as NULL. The default
>> value for this file is 10. The minimum and maximum are as
>> for /proc/sys/fs/mqueue/msg_max. If msg_default exceeds
>> msg_max, a new queue's default mq_maxmsg value is capped
>> to the msg_max limit. Up until Linux 2.6.28, the default
>> mq_maxmsg was 10; from Linux 2.6.28 to Linux 3.4, the
>> default was the value defined for the msg_max limit.
>>
>> /proc/sys/fs/mqueue/msg_max
>> This file can be used to view and change the ceiling value
>> for the maximum number of messages in a queue. This value
>> acts as a ceiling on the attr->mq_maxmsg argument given to
>> mq_open(3). The default value for msg_max is 10. The
>> minimum value is 1 (10 in kernels before 2.6.28). The
>> upper limit is HARD_MSGMAX. The msg_max limit is ignored
>> for privileged processes (CAP_SYS_RESOURCE), but the
>> HARD_MSGMAX ceiling is nevertheless imposed.
>>
>> The definition of HARD_MSGMAX has changed across kernel
>> versions:
>>
>> * Up to Linux 2.6.32: 131072 / sizeof(void *)
>>
>> * Linux 2.6.33 to 3.4: (32768 * sizeof(void *) / 4)
>>
>> * Since Linux 3.5: 65,536
>>
>> /proc/sys/fs/mqueue/msgsize_default (since Linux 3.5)
>> This file defines the value used for a new queue's mq_msg‐
>> size setting when the queue is created with a call to
>> mq_open(3) where attr is specified as NULL. The default
>> value for this file is 8192. The minimum and maximum are
>> as for /proc/sys/fs/mqueue/msgsize_max. If msg‐
>> size_default exceeds msgsize_max, a new queue's default
>> mq_msgsize value is capped to the msgsize_max limit. Up
>> until Linux 2.6.28, the default mq_msgsize was 8192; from
>> Linux 2.6.28 to Linux 3.4, the default was the value
>> defined for the msgsize_max limit.
>>
>> /proc/sys/fs/mqueue/msgsize_max
>> This file can be used to view and change the ceiling on
>> the maximum message size. This value acts as a ceiling on
>> the attr->mq_msgsize argument given to mq_open(3). The
>> default value for msgsize_max is 8192 bytes. The minimum
>> value is 128 (8192 in kernels before 2.6.28). The upper
>> limit for msgsize_max has varied across kernel versions:
>>
>> * Before Linux 2.6.28, the upper limit is INT_MAX.
>>
>> * From Linux 2.6.28 to 3.4, the limit is 1,048,576.
>>
>> * Since Linux 3.5, the limit is 16,777,216 (HARD_MSGSIZE‐
>> MAX).
>>
>> The msgsize_max limit is ignored for privileged process
>> (CAP_SYS_RESOURCE), but, since Linux 3.5, the HARD_MSG‐
>> SIZEMAX ceiling is enforced for privileged processes.
>>
>> /proc/sys/fs/mqueue/queues_max
>> This file can be used to view and change the system-wide
>> limit on the number of message queues that can be created.
>> The default value for queues_max is 256. The semantics of
>> this limit have changed across kernel versions as follows:
>>
>> * Before Linux 3.5, this limit could be changed to any
>> value in the range 0 to INT_MAX, but privileged pro‐
>> cesses (CAP_SYS_RESOURCE) can exceed the limit.
>>
>> * Since Linux 3.5, there is a ceiling for this limit of
>> 1024 (HARD_QUEUESMAX). Privileged processes
>> (CAP_SYS_RESOURCE) can exceed the queues_max limit, but
>> the HARD_QUEUESMAX limit is enforced even for privi‐
>> leged processes.
>>
>> * Starting with Linux 3.14, the HARD_QUEUESMAX ceiling is
>> removed: no ceiling is imposed on the queues_max limit,
>> and privileged processes (CAP_SYS_RESOURCE) can exceed
>> the limit.
>>
>
>
> --
> Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> GPG KeyID: 0E572FDD
>
>
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Document POSIX MQ /proc/sys/fs/mqueue files
[not found] ` <1412022198.23497.23.camel-dxKd5G12XOI1EaDjlw0dpg@public.gmane.org>
@ 2014-09-30 10:49 ` Michael Kerrisk (man-pages)
0 siblings, 0 replies; 10+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-09-30 10:49 UTC (permalink / raw)
To: Davidlohr Bueso
Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, Doug Ledford,
linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, lkml,
Madars Vitolins, Manfred Spraul
Hi David,
On 09/29/2014 10:23 PM, Davidlohr Bueso wrote:
> Hi Michael,
>
> Cc'ing Manfred.
(Thanks, I should have thought of that.)
> On Mon, 2014-09-29 at 11:10 +0200, Michael Kerrisk (man-pages) wrote:
>> Hello Doug, David,
>>
>> I think you two were the last ones to make significant
>> changes to the semantics of the files in /proc/sys/fs/mqueue,
>> so I wonder if you (or anyone else who is willing) might
>> take a look at the man page text below that I've written
>> (for the mq_overview(7) page) to describe past and current
>> reality, and let me know of improvements of corrections.
>
> Over the years posix mqueues have increasingly become a mess *sigh* :-/
> Thanks for doing this and untangling some of the historic changes.
>
>> From mq_overview(7) draft:
>>
>> /proc interfaces
>> The following interfaces can be used to limit the amount of ker‐
>> nel memory consumed by POSIX message queues and to set the
>> default attributes for new message queues:
>>
>> /proc/sys/fs/mqueue/msg_default (since Linux 3.5)
>> This file defines the value used for a new queue's
>> mq_maxmsg setting when the queue is created with a call to
>> mq_open(3) where attr is specified as NULL. The default
>> value for this file is 10. The minimum and maximum are as
>> for /proc/sys/fs/mqueue/msg_max. If msg_default exceeds
>> msg_max, a new queue's default mq_maxmsg value is capped
>> to the msg_max limit.
>
> I think rephrasing this would read easier. Basically the behavior is
> this:
>
> info->attr.mq_maxmsg = min(ipc_ns->mq_msg_max,
> ipc_ns->mq_msg_default);
>
> Something like:
> "a new queue's default mq_maxmsg value will be the smallest of msg_default and msg_max"
Yes, better. Changed. Thanks.
>
>> Up until Linux 2.6.28, the default
>> mq_maxmsg was 10; from Linux 2.6.28 to Linux 3.4, the
>> default was the value defined for the msg_max limit.
>>
>> /proc/sys/fs/mqueue/msg_max
>> This file can be used to view and change the ceiling value
>> for the maximum number of messages in a queue. This value
>> acts as a ceiling on the attr->mq_maxmsg argument given to
>> mq_open(3). The default value for msg_max is 10. The
>> minimum value is 1 (10 in kernels before 2.6.28). The
>> upper limit is HARD_MSGMAX. The msg_max limit is ignored
>> for privileged processes (CAP_SYS_RESOURCE), but the
>> HARD_MSGMAX ceiling is nevertheless imposed.
>
> Note that the HARD_MSGMAX check is done *only* for privileged processes,
> regular processes only check against namespace values. This is a pretty
> fundamental difference. The same goes of course for msgsize:
Yes, I understand. But, the existing text still seems okay to me. The
thing is that HARD_MSGMAX is still in effect a limit for unprivileged
processes also, since it is a ceiling on 'msg_max'. See what I mean?
> if (capable(CAP_SYS_RESOURCE)) {
> if (attr->mq_maxmsg > HARD_MSGMAX ||
> attr->mq_msgsize > HARD_MSGSIZEMAX)
> return -EINVAL;
> } else {
> if (attr->mq_maxmsg > ipc_ns->mq_msg_max ||
> attr->mq_msgsize > ipc_ns->mq_msgsize_max)
> return -EINVAL;
> }
>
>
>> The definition of HARD_MSGMAX has changed across kernel
>> versions:
>>
>> * Up to Linux 2.6.32: 131072 / sizeof(void *)
>>
>> * Linux 2.6.33 to 3.4: (32768 * sizeof(void *) / 4)
>>
>> * Since Linux 3.5: 65,536
>>
>> /proc/sys/fs/mqueue/msgsize_default (since Linux 3.5)
>
> You might want to mention the units (bytes) when refering to limits.
I added the words "bytes" in the text near here.
>> This file defines the value used for a new queue's mq_msg‐
>> size setting when the queue is created with a call to
>> mq_open(3) where attr is specified as NULL. The default
>> value for this file is 8192. The minimum and maximum are
>> as for /proc/sys/fs/mqueue/msgsize_max. If msg‐
>> size_default exceeds msgsize_max, a new queue's default
>> mq_msgsize value is capped to the msgsize_max limit. Up
>> until Linux 2.6.28, the default mq_msgsize was 8192; from
>> Linux 2.6.28 to Linux 3.4, the default was the value
>> defined for the msgsize_max limit.
>>
>> /proc/sys/fs/mqueue/msgsize_max
>
> Ditto here.
("bytes" does already get mentioned below.)
>> This file can be used to view and change the ceiling on
>> the maximum message size. This value acts as a ceiling on
>> the attr->mq_msgsize argument given to mq_open(3). The
>> default value for msgsize_max is 8192 bytes. The minimum
>> value is 128 (8192 in kernels before 2.6.28). The upper
>> limit for msgsize_max has varied across kernel versions:
>>
>> * Before Linux 2.6.28, the upper limit is INT_MAX.
>>
>> * From Linux 2.6.28 to 3.4, the limit is 1,048,576.
>>
>> * Since Linux 3.5, the limit is 16,777,216 (HARD_MSGSIZE‐
>> MAX).
>> The msgsize_max limit is ignored for privileged process
>> (CAP_SYS_RESOURCE), but, since Linux 3.5, the HARD_MSG‐
>> SIZEMAX ceiling is enforced for privileged processes.
>>
>> /proc/sys/fs/mqueue/queues_max
>> This file can be used to view and change the system-wide
>> limit on the number of message queues that can be created.
>> The default value for queues_max is 256. The semantics of
>> this limit have changed across kernel versions as follows:
>>
>> * Before Linux 3.5, this limit could be changed to any
>> value in the range 0 to INT_MAX, but privileged pro‐
>> cesses (CAP_SYS_RESOURCE) can exceed the limit.
>>
>> * Since Linux 3.5, there is a ceiling for this limit of
>> 1024 (HARD_QUEUESMAX). Privileged processes
>> (CAP_SYS_RESOURCE) can exceed the queues_max limit, but
>> the HARD_QUEUESMAX limit is enforced even for privi‐
>> leged processes.
>>
>> * Starting with Linux 3.14, the HARD_QUEUESMAX ceiling is
>> removed: no ceiling is imposed on the queues_max limit,
>> and privileged processes (CAP_SYS_RESOURCE) can exceed
>> the limit.
>
> Given that this was treated as a bug that breaks user-space, I don't
> think we really want to document the behavior between 3.5 and 3.14 (all
> three bullets). Stable kernels back to 3.5 now have the default
> behavior, so its as nothing ever changed(?). Now, if you explicitly want
> to document such bug, I would agree, but just not mentioning it as
> intentional differences in behavior. Does that make sense?
Yes. I've simplified that piece to just:
/proc/sys/fs/mqueue/queues_max
This file can be used to view and change the system-wide
limit on the number of message queues that can be created.
The default value for queues_max is 256. No ceiling is
imposed on the queues_max limit; privileged processes
(CAP_SYS_RESOURCE) can exceed the limit (but see BUGS).
plus:
BUGS
In Linux versions 3.5 to 3.14, the kernel imposed a ceiling of
1024 (HARD_QUEUESMAX) on the value to which the queues_max limit
could be raised, and the ceiling was enforced even for privileged
processes. This ceiling value was removed in Linux 3.14, and
patches to stable kernels 3.5.x to 3.13.x also removed the ceil‐
ing.
Okay?
Thanks for the careful review, David.
Cheers,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Document POSIX MQ /proc/sys/fs/mqueue files
2014-09-30 10:12 ` Michael Kerrisk (man-pages)
@ 2014-09-30 17:30 ` Davidlohr Bueso
2014-09-30 17:42 ` Davidlohr Bueso
[not found] ` <CAKgNAkj9+Z1yO-zrQ_qWFut7BqOkzNtNCruSSRyTKMpXZOcBaw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
1 sibling, 1 reply; 10+ messages in thread
From: Davidlohr Bueso @ 2014-09-30 17:30 UTC (permalink / raw)
To: mtk.manpages
Cc: Doug Ledford, linux-man@vger.kernel.org, lkml, Madars Vitolins,
Manfred Spraul, Andrew Morton
On Tue, 2014-09-30 at 12:12 +0200, Michael Kerrisk (man-pages) wrote:
> Hi Doug,
>
> On Mon, Sep 29, 2014 at 7:28 PM, Doug Ledford <dledford@redhat.com> wrote:
> > On Mon, 2014-09-29 at 11:10 +0200, Michael Kerrisk (man-pages) wrote:
> >> Hello Doug, David,
> >>
> >> I think you two were the last ones to make significant
> >> changes to the semantics of the files in /proc/sys/fs/mqueue,
> >> so I wonder if you (or anyone else who is willing) might
> >> take a look at the man page text below that I've written
> >> (for the mq_overview(7) page) to describe past and current
> >> reality, and let me know of improvements of corrections.
> >>
> >> By the way, Doug, your commit ce2d52cc1364 appears to have
> >> changed/broken the semantics of the files in the /dev/mqueue
> >> filesystem. Formerly, the QSIZE field in these files showed
> >> the number of bytes of real user data in all of the queued
> >> messages. After that commit, QSIZE now includes kernel
> >> overhead bytes, which does not seem very useful for user
> >> space. Was that change intentional? I see no mention of the
> >> change in the commit message, so it sounds like it was not
> >> intended.
> >
> > That change didn't come in that commit. That commit modified it, but
> > didn't introduce it.
>
> (Which commit was it then? d6629859b36 ?)
By just looking at msg_insert and msg_get, I think so, yeah.
>
> > Now, was it intentional? Yes. Is it valuable, useful? That depends on
> > your perspective.
>
> Thanks for the detailed explanation below. However, I don't understand
> why the (useful) work that you describe below necessitated a change in
> the QSIZE value that was exposed to user space. Surely the necessary
> changes could have been done internally while still leaving QSIZE to
> expose the same value it ever did? As things stand now (and unless I
> am missing something), QSIZE exposes an implementation-specific
> internal value that has little meaning or value to user space. And,
> it's unfortunate that the commit message made no mention of the fact
> that there was an ABI change here.
Agreed. And this needs to be changed back -- *although* there have been
0 bug reports afaict. Probably similarly to what we did with the
queues_max issue: stable since v3.5. Doug, any thoughts?
Thanks,
Davidlohr
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Document POSIX MQ /proc/sys/fs/mqueue files
2014-09-30 17:30 ` Davidlohr Bueso
@ 2014-09-30 17:42 ` Davidlohr Bueso
0 siblings, 0 replies; 10+ messages in thread
From: Davidlohr Bueso @ 2014-09-30 17:42 UTC (permalink / raw)
To: mtk.manpages
Cc: Doug Ledford, linux-man@vger.kernel.org, lkml, Madars Vitolins,
Manfred Spraul, Andrew Morton
On Tue, 2014-09-30 at 10:30 -0700, Davidlohr Bueso wrote:
> Agreed. And this needs to be changed back -- *although* there have been
> 0 bug reports afaict. Probably similarly to what we did with the
> queues_max issue: stable since v3.5. Doug, any thoughts?
Note that by changing back, I don't mean reverting your patches, just
not exporting the extra bits to QSIZE.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Document POSIX MQ /proc/sys/fs/mqueue files
[not found] ` <CAKgNAkj9+Z1yO-zrQ_qWFut7BqOkzNtNCruSSRyTKMpXZOcBaw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-09-30 19:57 ` Doug Ledford
[not found] ` <1412107074.4930.9.camel-v+aXH1h/sVwpzh8Nc7Vzg+562jBIR2Zt@public.gmane.org>
0 siblings, 1 reply; 10+ messages in thread
From: Doug Ledford @ 2014-09-30 19:57 UTC (permalink / raw)
To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
Cc: Davidlohr Bueso,
linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, lkml,
Madars Vitolins
[-- Attachment #1: Type: text/plain, Size: 9899 bytes --]
On Tue, 2014-09-30 at 12:12 +0200, Michael Kerrisk (man-pages) wrote:
> Hi Doug,
>
> On Mon, Sep 29, 2014 at 7:28 PM, Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > On Mon, 2014-09-29 at 11:10 +0200, Michael Kerrisk (man-pages) wrote:
> >> Hello Doug, David,
> >>
> >> I think you two were the last ones to make significant
> >> changes to the semantics of the files in /proc/sys/fs/mqueue,
> >> so I wonder if you (or anyone else who is willing) might
> >> take a look at the man page text below that I've written
> >> (for the mq_overview(7) page) to describe past and current
> >> reality, and let me know of improvements of corrections.
> >>
> >> By the way, Doug, your commit ce2d52cc1364 appears to have
> >> changed/broken the semantics of the files in the /dev/mqueue
> >> filesystem. Formerly, the QSIZE field in these files showed
> >> the number of bytes of real user data in all of the queued
> >> messages. After that commit, QSIZE now includes kernel
> >> overhead bytes, which does not seem very useful for user
> >> space. Was that change intentional? I see no mention of the
> >> change in the commit message, so it sounds like it was not
> >> intended.
> >
> > That change didn't come in that commit. That commit modified it, but
> > didn't introduce it.
>
> (Which commit was it then? d6629859b36 ?)
Yes, that's the one.
> > Now, was it intentional? Yes. Is it valuable, useful? That depends on
> > your perspective.
>
> Thanks for the detailed explanation below. However, I don't understand
> why the (useful) work that you describe below necessitated a change in
> the QSIZE value that was exposed to user space.
Given how long ago this was, I can't say for sure, old age and memory
being what it is ;-) Most likely, when I rewrote the msg_insert
routine, I saw we were updating info->qsize and said to myself "Crap,
I've added a new structure, we have to account for it too" and made the
change.
> Surely the necessary
> changes could have been done internally while still leaving QSIZE to
> expose the same value it ever did?
Yes, it could have.
> As things stand now (and unless I
> am missing something), QSIZE exposes an implementation-specific
> internal value that has little meaning or value to user space.
This part is not necessarily true. I'm pretty sure at the time I
thought the struct msg_msg was also included in qsize (even though it
isn't). And although we've not had any reports of bugs on this, I have
a Red Hat bug against the accounting change (namely that it caught one
user off guard that they needed to increase their RLIMIT_MSGQUEUE to
create the same number/size of queues they used to be able to create)
and so it does have some value in that it's the only way a user has of
knowing just how much the overhead of their queue is biting them in the
ass in terms of that RLIMIT_MSGQUEUE test. But, since it doesn't
include the size of each struct msg_msg, it's incomplete even for that
purpose. Like I said in my previous email, I'm not so sure it wouldn't
be wise to include some extra data in this file (but that again would be
an ABI break). Maybe a second line that includes something like this:
CUR_OVERHEAD: # RLIM_OVERHEAD: # RLIM_PAYLOAD: #
where CUR_OVERHEAD is how much we currently have allocated in internal
kernel structures for the current DATA on the line above, and the other
two are the amount of size we charged against the RLIMIT_MSGQUEUE
available to the user based upon their queue parameters and the
potential worst case scenario of queue usage.
> And,
> it's unfortunate that the commit message made no mention of the fact
> that there was an ABI change here.
I don't think I realized it was an ABI change at the time.
> [...]
>
> > The man page below looks fine to me.
>
> Thanks for checking it!
>
> Cheers,
>
> Michael
>
>
> > It covers the various
> > incarnations. If I add some tweaks to the priorities value though, it
> > will need updating again ;-)
> >
> > Although this section wasn't included below, I would update how the
> > memory is calculated to match what I wrote above. However, I would also
> > put in a notation that the calculation can change when the kernel's
> > internal implementation changes and resource usage therefore changes.
> >
> >> Cheers,
> >>
> >> Michael
> >>
> >> From mq_overview(7) draft:
> >>
> >> /proc interfaces
> >> The following interfaces can be used to limit the amount of ker‐
> >> nel memory consumed by POSIX message queues and to set the
> >> default attributes for new message queues:
> >>
> >> /proc/sys/fs/mqueue/msg_default (since Linux 3.5)
> >> This file defines the value used for a new queue's
> >> mq_maxmsg setting when the queue is created with a call to
> >> mq_open(3) where attr is specified as NULL. The default
> >> value for this file is 10. The minimum and maximum are as
> >> for /proc/sys/fs/mqueue/msg_max. If msg_default exceeds
> >> msg_max, a new queue's default mq_maxmsg value is capped
> >> to the msg_max limit. Up until Linux 2.6.28, the default
> >> mq_maxmsg was 10; from Linux 2.6.28 to Linux 3.4, the
> >> default was the value defined for the msg_max limit.
> >>
> >> /proc/sys/fs/mqueue/msg_max
> >> This file can be used to view and change the ceiling value
> >> for the maximum number of messages in a queue. This value
> >> acts as a ceiling on the attr->mq_maxmsg argument given to
> >> mq_open(3). The default value for msg_max is 10. The
> >> minimum value is 1 (10 in kernels before 2.6.28). The
> >> upper limit is HARD_MSGMAX. The msg_max limit is ignored
> >> for privileged processes (CAP_SYS_RESOURCE), but the
> >> HARD_MSGMAX ceiling is nevertheless imposed.
> >>
> >> The definition of HARD_MSGMAX has changed across kernel
> >> versions:
> >>
> >> * Up to Linux 2.6.32: 131072 / sizeof(void *)
> >>
> >> * Linux 2.6.33 to 3.4: (32768 * sizeof(void *) / 4)
> >>
> >> * Since Linux 3.5: 65,536
> >>
> >> /proc/sys/fs/mqueue/msgsize_default (since Linux 3.5)
> >> This file defines the value used for a new queue's mq_msg‐
> >> size setting when the queue is created with a call to
> >> mq_open(3) where attr is specified as NULL. The default
> >> value for this file is 8192. The minimum and maximum are
> >> as for /proc/sys/fs/mqueue/msgsize_max. If msg‐
> >> size_default exceeds msgsize_max, a new queue's default
> >> mq_msgsize value is capped to the msgsize_max limit. Up
> >> until Linux 2.6.28, the default mq_msgsize was 8192; from
> >> Linux 2.6.28 to Linux 3.4, the default was the value
> >> defined for the msgsize_max limit.
> >>
> >> /proc/sys/fs/mqueue/msgsize_max
> >> This file can be used to view and change the ceiling on
> >> the maximum message size. This value acts as a ceiling on
> >> the attr->mq_msgsize argument given to mq_open(3). The
> >> default value for msgsize_max is 8192 bytes. The minimum
> >> value is 128 (8192 in kernels before 2.6.28). The upper
> >> limit for msgsize_max has varied across kernel versions:
> >>
> >> * Before Linux 2.6.28, the upper limit is INT_MAX.
> >>
> >> * From Linux 2.6.28 to 3.4, the limit is 1,048,576.
> >>
> >> * Since Linux 3.5, the limit is 16,777,216 (HARD_MSGSIZE‐
> >> MAX).
> >>
> >> The msgsize_max limit is ignored for privileged process
> >> (CAP_SYS_RESOURCE), but, since Linux 3.5, the HARD_MSG‐
> >> SIZEMAX ceiling is enforced for privileged processes.
> >>
> >> /proc/sys/fs/mqueue/queues_max
> >> This file can be used to view and change the system-wide
> >> limit on the number of message queues that can be created.
> >> The default value for queues_max is 256. The semantics of
> >> this limit have changed across kernel versions as follows:
> >>
> >> * Before Linux 3.5, this limit could be changed to any
> >> value in the range 0 to INT_MAX, but privileged pro‐
> >> cesses (CAP_SYS_RESOURCE) can exceed the limit.
> >>
> >> * Since Linux 3.5, there is a ceiling for this limit of
> >> 1024 (HARD_QUEUESMAX). Privileged processes
> >> (CAP_SYS_RESOURCE) can exceed the queues_max limit, but
> >> the HARD_QUEUESMAX limit is enforced even for privi‐
> >> leged processes.
> >>
> >> * Starting with Linux 3.14, the HARD_QUEUESMAX ceiling is
> >> removed: no ceiling is imposed on the queues_max limit,
> >> and privileged processes (CAP_SYS_RESOURCE) can exceed
> >> the limit.
> >>
> >
> >
> > --
> > Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > GPG KeyID: 0E572FDD
> >
> >
>
>
>
--
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
GPG KeyID: 0E572FDD
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Document POSIX MQ /proc/sys/fs/mqueue files
[not found] ` <1412107074.4930.9.camel-v+aXH1h/sVwpzh8Nc7Vzg+562jBIR2Zt@public.gmane.org>
@ 2014-10-01 8:19 ` Michael Kerrisk (man-pages)
0 siblings, 0 replies; 10+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-10-01 8:19 UTC (permalink / raw)
To: Doug Ledford
Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, Davidlohr Bueso,
linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, lkml,
Madars Vitolins
Hi Doug,
On 09/30/2014 09:57 PM, Doug Ledford wrote:
> On Tue, 2014-09-30 at 12:12 +0200, Michael Kerrisk (man-pages) wrote:
>> Hi Doug,
>>
>> On Mon, Sep 29, 2014 at 7:28 PM, Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>>> On Mon, 2014-09-29 at 11:10 +0200, Michael Kerrisk (man-pages) wrote:
>>>> Hello Doug, David,
>>>>
>>>> I think you two were the last ones to make significant
>>>> changes to the semantics of the files in /proc/sys/fs/mqueue,
>>>> so I wonder if you (or anyone else who is willing) might
>>>> take a look at the man page text below that I've written
>>>> (for the mq_overview(7) page) to describe past and current
>>>> reality, and let me know of improvements of corrections.
>>>>
>>>> By the way, Doug, your commit ce2d52cc1364 appears to have
>>>> changed/broken the semantics of the files in the /dev/mqueue
>>>> filesystem. Formerly, the QSIZE field in these files showed
>>>> the number of bytes of real user data in all of the queued
>>>> messages. After that commit, QSIZE now includes kernel
>>>> overhead bytes, which does not seem very useful for user
>>>> space. Was that change intentional? I see no mention of the
>>>> change in the commit message, so it sounds like it was not
>>>> intended.
>>>
>>> That change didn't come in that commit. That commit modified it, but
>>> didn't introduce it.
>>
>> (Which commit was it then? d6629859b36 ?)
>
> Yes, that's the one.
>
>>> Now, was it intentional? Yes. Is it valuable, useful? That depends on
>>> your perspective.
>>
>> Thanks for the detailed explanation below. However, I don't understand
>> why the (useful) work that you describe below necessitated a change in
>> the QSIZE value that was exposed to user space.
>
> Given how long ago this was, I can't say for sure, old age and memory
> being what it is ;-) Most likely, when I rewrote the msg_insert
> routine, I saw we were updating info->qsize and said to myself "Crap,
> I've added a new structure, we have to account for it too" and made the
> change.
>
>> Surely the necessary
>> changes could have been done internally while still leaving QSIZE to
>> expose the same value it ever did?
>
> Yes, it could have.
>
>> As things stand now (and unless I
>> am missing something), QSIZE exposes an implementation-specific
>> internal value that has little meaning or value to user space.
>
> This part is not necessarily true. I'm pretty sure at the time I
> thought the struct msg_msg was also included in qsize (even though it
> isn't). And although we've not had any reports of bugs on this, I have
> a Red Hat bug against the accounting change (namely that it caught one
> user off guard that they needed to increase their RLIMIT_MSGQUEUE to
> create the same number/size of queues they used to be able to create)
> and so it does have some value in that it's the only way a user has of
> knowing just how much the overhead of their queue is biting them in the
> ass in terms of that RLIMIT_MSGQUEUE test. But, since it doesn't
> include the size of each struct msg_msg, it's incomplete even for that
> purpose. Like I said in my previous email, I'm not so sure it wouldn't
> be wise to include some extra data in this file (but that again would be
> an ABI break). Maybe a second line that includes something like this:
>
> CUR_OVERHEAD: # RLIM_OVERHEAD: # RLIM_PAYLOAD: #
>
> where CUR_OVERHEAD is how much we currently have allocated in internal
> kernel structures for the current DATA on the line above, and the other
> two are the amount of size we charged against the RLIMIT_MSGQUEUE
> available to the user based upon their queue parameters and the
> potential worst case scenario of queue usage.
>
>> And,
>> it's unfortunate that the commit message made no mention of the fact
>> that there was an ABI change here.
>
> I don't think I realized it was an ABI change at the time.
So, to summarize:
* QSIZE returning a count of the user data bytes in the queue was
the actual (and intended and documented) behavior from Linux
2.6.6 to 3.4.
* Linux 3.5 changed the value exposed by QSIZE to something
that more closely matches the amount of memory
consumed by the kernel implementation. However:
-- That change broke the ABI.
-- The newly exposed value still doesn't match the
consumed memory as accounted against RLIMIT_MSGQUEUE,
so it's still not really useful.
* No-one complained about the QSIZE ABI change yet (well, except
me), but that doesn't mean no-one has been bitten. After
all, it took a while before reports about the HARD_QUEUESMAX
breakage to filter through.
I think QSIZE really should be fixed to expose the same value
it used to expose, which is the real number of user data bytes
in the queue. I'm agnostic on whether or not further fields
along the lines you suggest should be added to the /dev/mqueue
files. In my opinion, that's an ABI extension, but not a breakage:
those files have been designed for easy parsing with fields of
the form "name:value", and properly designed applications
won't be tripped up by extensions to the format.
Cheers,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Document POSIX MQ /proc/sys/fs/mqueue files
[not found] ` <1412011687.15492.39.camel-v+aXH1h/sVwpzh8Nc7Vzg+562jBIR2Zt@public.gmane.org>
2014-09-30 10:12 ` Michael Kerrisk (man-pages)
@ 2014-10-01 10:02 ` Michael Kerrisk (man-pages)
1 sibling, 0 replies; 10+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-10-01 10:02 UTC (permalink / raw)
To: Doug Ledford
Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, Davidlohr Bueso,
linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, lkml,
Madars Vitolins
On 09/29/2014 07:28 PM, Doug Ledford wrote:
> On Mon, 2014-09-29 at 11:10 +0200, Michael Kerrisk (man-pages) wrote:
>> Hello Doug, David,
>>
>> I think you two were the last ones to make significant
>> changes to the semantics of the files in /proc/sys/fs/mqueue,
>> so I wonder if you (or anyone else who is willing) might
>> take a look at the man page text below that I've written
>> (for the mq_overview(7) page) to describe past and current
>> reality, and let me know of improvements of corrections.
>>
>> By the way, Doug, your commit ce2d52cc1364 appears to have
>> changed/broken the semantics of the files in the /dev/mqueue
>> filesystem. Formerly, the QSIZE field in these files showed
>> the number of bytes of real user data in all of the queued
>> messages. After that commit, QSIZE now includes kernel
>> overhead bytes, which does not seem very useful for user
>> space. Was that change intentional? I see no mention of the
>> change in the commit message, so it sounds like it was not
>> intended.
>
> That change didn't come in that commit. That commit modified it, but
> didn't introduce it.
>
> Now, was it intentional? Yes. Is it valuable, useful? That depends on
> your perspective.
>
> One of the problems I ran into with that code relates to the rlimit
> checks that happen at queue creation time. We used to check to see if
>
> msg_num * (msg_size + sizeof struct msg_msg *)
>
> would fit within the user's currently available rlimit for
> RLIMIT_MSGQUEUE. This was not an accurate check though. It accounted
> for the msg number, and the payload size, and the array of pointers we
> used to point to the msg_msg structs that held each message, but ignored
> the msg_msg structs themselves. Given that we accept the creation of
> message queues with a msg_size of 1, this could be used to create a
> minor DoS because of the fact that there was such a large size
> difference between the sizeof struct msg_msg and the size of our
> messages. In this scenario, a msg_size of 1 would result in us
> accounting 9/5 bytes per message on 64bit/32bit OSes respecitively, but
> actually using 49bytes/19bytes respectively. That's a 4:1 ratio at the
> worst case for the different between actual memory used and memory usage
> accounted against the RLIMIT_MSGQUEUE limit. So before I ever got around
> to doing the rbtree update, I fixed this to at least be more accurate
> and it became
>
> msg_num * (msg_size + sizeof struct msg_msg * + sizeof struct msg_msg)
>
> Even this wasn't totally accurate though, as large messages could result
> in the allocation of additional msg_msgseg segments. However, I ignored
> that inaccuracy because once the message size is large enough to need
> additional SG segments, we are no longer in danger of any sort of minor
> DoS because our own overhead will become nothing more than noise to the
> calculation.
So, for what it's worth, I applied the following patch in getrlimit.2
to describe the post 3.5 behavior. Look okay?
Cheers,
Michael
diff --git a/man2/getrlimit.2 b/man2/getrlimit.2
index 91fed13..a3e4285 100644
--- a/man2/getrlimit.2
+++ b/man2/getrlimit.2
@@ -250,8 +250,19 @@ Each message queue that the user creates counts (until it i
s removed)
against this limit according to the formula:
.nf
- bytes = attr.mq_maxmsg * sizeof(struct msg_msg *) +
- attr.mq_maxmsg * attr.mq_msgsize
+ Since Linux 3.5:
+ bytes = attr.mq_maxmsg * sizeof(struct msg_msg) +
+ min(attr.mq_maxmsg, MQ_PRIO_MAX) *
+ sizeof(struct posix_msg_tree_node)+
+ /* For overhead */
+ attr.mq_maxmsg * attr.mq_msgsize;
+ /* For message data */
+
+ Linux 3.4 and earlier:
+ bytes = attr.mq_maxmsg * sizeof(struct msg_msg *) +
+ /* For overhead */
+ attr.mq_maxmsg * attr.mq_msgsize;
+ /* For message data */
.fi
where
@@ -259,11 +270,16 @@ where
is the
.I mq_attr
structure specified as the fourth argument to
-.BR mq_open (3).
+.BR mq_open (3),
+and the
+.I msg_msg
+and
+.I posix_msg_tree_node
+structures are kernel-internal structures.
-The first addend in the formula, which includes
-.I "sizeof(struct msg_msg\ *)"
-(4 bytes on Linux/i386), ensures that the user cannot
+The "overhead" addend in the formula accounts for overhead
+bytes required by the implementation
+and ensures that the user cannot
create an unlimited number of zero-length messages (such messages
nevertheless each consume some system memory for bookkeeping overhead).
.TP
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-10-01 10:02 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-29 9:10 Document POSIX MQ /proc/sys/fs/mqueue files Michael Kerrisk (man-pages)
2014-09-29 17:28 ` Doug Ledford
[not found] ` <1412011687.15492.39.camel-v+aXH1h/sVwpzh8Nc7Vzg+562jBIR2Zt@public.gmane.org>
2014-09-30 10:12 ` Michael Kerrisk (man-pages)
2014-09-30 17:30 ` Davidlohr Bueso
2014-09-30 17:42 ` Davidlohr Bueso
[not found] ` <CAKgNAkj9+Z1yO-zrQ_qWFut7BqOkzNtNCruSSRyTKMpXZOcBaw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-09-30 19:57 ` Doug Ledford
[not found] ` <1412107074.4930.9.camel-v+aXH1h/sVwpzh8Nc7Vzg+562jBIR2Zt@public.gmane.org>
2014-10-01 8:19 ` Michael Kerrisk (man-pages)
2014-10-01 10:02 ` Michael Kerrisk (man-pages)
2014-09-29 20:23 ` Davidlohr Bueso
[not found] ` <1412022198.23497.23.camel-dxKd5G12XOI1EaDjlw0dpg@public.gmane.org>
2014-09-30 10:49 ` Michael Kerrisk (man-pages)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).