linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Document POSIX MQ /proc/sys/fs/mqueue files
@ 2014-09-29  9:10 Michael Kerrisk (man-pages)
  2014-09-29 17:28 ` Doug Ledford
  2014-09-29 20:23 ` Davidlohr Bueso
  0 siblings, 2 replies; 10+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-09-29  9:10 UTC (permalink / raw)
  To: Davidlohr Bueso, Doug Ledford
  Cc: mtk.manpages, linux-man@vger.kernel.org, lkml, Madars Vitolins

Hello Doug, David,

I think you two were the last ones to make significant 
changes to the semantics of the files in /proc/sys/fs/mqueue,
so I wonder if you (or anyone else who is willing) might
take a look at the man page text below that I've written
(for the mq_overview(7) page) to describe past and current
reality, and let me know of improvements of corrections.

By the way, Doug, your commit ce2d52cc1364 appears to have
changed/broken the semantics of the files in the /dev/mqueue 
filesystem. Formerly, the QSIZE field in these files showed
the number of bytes of real user data in all of the queued
messages. After that commit, QSIZE now includes kernel 
overhead bytes, which does not seem very useful for user 
space. Was that change intentional? I see no mention of the
change in the commit message, so it sounds like it was not 
intended.

Cheers,

Michael

>From mq_overview(7) draft:

   /proc interfaces
       The following interfaces can be used to limit the amount of  ker‐
       nel  memory  consumed  by  POSIX  message  queues  and to set the
       default attributes for new message queues:

       /proc/sys/fs/mqueue/msg_default (since Linux 3.5)
              This file  defines  the  value  used  for  a  new  queue's
              mq_maxmsg setting when the queue is created with a call to
              mq_open(3) where attr is specified as NULL.   The  default
              value for this file is 10.  The minimum and maximum are as
              for /proc/sys/fs/mqueue/msg_max.  If  msg_default  exceeds
              msg_max,  a  new queue's default mq_maxmsg value is capped
              to the msg_max limit.  Up until Linux 2.6.28, the  default
              mq_maxmsg  was  10;  from  Linux  2.6.28 to Linux 3.4, the
              default was the value defined for the msg_max limit.

       /proc/sys/fs/mqueue/msg_max
              This file can be used to view and change the ceiling value
              for the maximum number of messages in a queue.  This value
              acts as a ceiling on the attr->mq_maxmsg argument given to
              mq_open(3).   The  default  value  for msg_max is 10.  The
              minimum value is 1 (10 in  kernels  before  2.6.28).   The
              upper  limit is HARD_MSGMAX.  The msg_max limit is ignored
              for  privileged  processes  (CAP_SYS_RESOURCE),  but   the
              HARD_MSGMAX ceiling is nevertheless imposed.

              The  definition  of  HARD_MSGMAX has changed across kernel
              versions:

              *  Up to Linux 2.6.32: 131072 / sizeof(void *)

              *  Linux 2.6.33 to 3.4: (32768 * sizeof(void *) / 4)

              *  Since Linux 3.5: 65,536

       /proc/sys/fs/mqueue/msgsize_default (since Linux 3.5)
              This file defines the value used for a new queue's mq_msg‐
              size  setting  when  the  queue  is created with a call to
              mq_open(3) where attr is specified as NULL.   The  default
              value  for this file is 8192.  The minimum and maximum are
              as   for   /proc/sys/fs/mqueue/msgsize_max.     If    msg‐
              size_default  exceeds  msgsize_max,  a new queue's default
              mq_msgsize value is capped to the msgsize_max  limit.   Up
              until  Linux 2.6.28, the default mq_msgsize was 8192; from
              Linux 2.6.28 to Linux  3.4,  the  default  was  the  value
              defined for the msgsize_max limit.

       /proc/sys/fs/mqueue/msgsize_max
              This  file  can  be used to view and change the ceiling on
              the maximum message size.  This value acts as a ceiling on
              the  attr->mq_msgsize  argument  given to mq_open(3).  The
              default value for msgsize_max is 8192 bytes.  The  minimum
              value  is  128 (8192 in kernels before 2.6.28).  The upper
              limit for msgsize_max has varied across kernel versions:

              *  Before Linux 2.6.28, the upper limit is INT_MAX.

              *  From Linux 2.6.28 to 3.4, the limit is 1,048,576.

              *  Since Linux 3.5, the limit is 16,777,216 (HARD_MSGSIZE‐
                 MAX).

              The  msgsize_max  limit  is ignored for privileged process
              (CAP_SYS_RESOURCE), but, since Linux  3.5,  the  HARD_MSG‐
              SIZEMAX ceiling is enforced for privileged processes.

       /proc/sys/fs/mqueue/queues_max
              This  file  can be used to view and change the system-wide
              limit on the number of message queues that can be created.
              The default value for queues_max is 256.  The semantics of
              this limit have changed across kernel versions as follows:

              *  Before Linux 3.5, this limit could be  changed  to  any
                 value  in  the  range 0 to INT_MAX, but privileged pro‐
                 cesses (CAP_SYS_RESOURCE) can exceed the limit.

              *  Since Linux 3.5, there is a ceiling for this  limit  of
                 1024     (HARD_QUEUESMAX).      Privileged    processes
                 (CAP_SYS_RESOURCE) can exceed the queues_max limit, but
                 the  HARD_QUEUESMAX  limit  is enforced even for privi‐
                 leged processes.

              *  Starting with Linux 3.14, the HARD_QUEUESMAX ceiling is
                 removed: no ceiling is imposed on the queues_max limit,
                 and privileged processes (CAP_SYS_RESOURCE) can  exceed
                 the limit.

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Document POSIX MQ /proc/sys/fs/mqueue files
  2014-09-29  9:10 Document POSIX MQ /proc/sys/fs/mqueue files Michael Kerrisk (man-pages)
@ 2014-09-29 17:28 ` Doug Ledford
  2014-09-30 10:12   ` Michael Kerrisk (man-pages)
  2014-10-01 10:02   ` Michael Kerrisk (man-pages)
  2014-09-29 20:23 ` Davidlohr Bueso
  1 sibling, 2 replies; 10+ messages in thread
From: Doug Ledford @ 2014-09-29 17:28 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Davidlohr Bueso, linux-man@vger.kernel.org, lkml, Madars Vitolins

[-- Attachment #1: Type: text/plain, Size: 11412 bytes --]

On Mon, 2014-09-29 at 11:10 +0200, Michael Kerrisk (man-pages) wrote:
> Hello Doug, David,
> 
> I think you two were the last ones to make significant 
> changes to the semantics of the files in /proc/sys/fs/mqueue,
> so I wonder if you (or anyone else who is willing) might
> take a look at the man page text below that I've written
> (for the mq_overview(7) page) to describe past and current
> reality, and let me know of improvements of corrections.
> 
> By the way, Doug, your commit ce2d52cc1364 appears to have
> changed/broken the semantics of the files in the /dev/mqueue 
> filesystem. Formerly, the QSIZE field in these files showed
> the number of bytes of real user data in all of the queued
> messages. After that commit, QSIZE now includes kernel 
> overhead bytes, which does not seem very useful for user 
> space. Was that change intentional? I see no mention of the
> change in the commit message, so it sounds like it was not 
> intended.

That change didn't come in that commit.  That commit modified it, but
didn't introduce it.

Now, was it intentional? Yes.  Is it valuable, useful?  That depends on
your perspective.

One of the problems I ran into with that code relates to the rlimit
checks that happen at queue creation time.  We used to check to see if

 msg_num * (msg_size + sizeof struct msg_msg *)

would fit within the user's currently available rlimit for
RLIMIT_MSGQUEUE.  This was not an accurate check though.  It accounted
for the msg number, and the payload size, and the array of pointers we
used to point to the msg_msg structs that held each message, but ignored
the msg_msg structs themselves.  Given that we accept the creation of
message queues with a msg_size of 1, this could be used to create a
minor DoS because of the fact that there was such a large size
difference between the sizeof struct msg_msg and the size of our
messages.  In this scenario, a msg_size of 1 would result in us
accounting 9/5 bytes per message on 64bit/32bit OSes respecitively, but
actually using 49bytes/19bytes respectively.  That's a 4:1 ratio at the
worst case for the different between actual memory used and memory usage
accounted against the RLIMIT_MSGQUEUE limit. So before I ever got around
to doing the rbtree update, I fixed this to at least be more accurate
and it became

 msg_num * (msg_size + sizeof struct msg_msg * + sizeof struct msg_msg)

Even this wasn't totally accurate though, as large messages could result
in the allocation of additional msg_msgseg segments.  However, I ignored
that inaccuracy because once the message size is large enough to need
additional SG segments, we are no longer in danger of any sort of minor
DoS because our own overhead will become nothing more than noise to the
calculation.

When I then changed things to use rbtrees, I again updated the way we
calculate memory consumed by a queue.  The rbtrees are used one per
priority with a list head attached to our rbtree node so that once we
locate our given priority, we have O(1) insertion and removal of
messages.  It just so happens that, sometime long ago, someone set our
maximum number of priorities we support in Linux at 32768.  This kills
us on our memory calculations because the size of the msg_tree_node
struct is another 40 bytes on 64bit.  That means if someone creates a
message queue with 32768 max_msgs, and a msg_size of 1, they can cause
us to allocate 32768 struct msg_msg, 32768 struct posix_msg_tree_node,
and 32768 * 1 payload.  In order to protect against that sort of
exploitation, the new memory usage calculation had to become:

 msg_num * (msg_size + sizeof struct msg_msg) +
   sizeof struct posix_msg_tree_node * min(msg_num, max_priorities)

So, that's how we now calculate the size of a queue when checking it
against RLIMIT_MSGQUEUE to see if the user has the ability to create a
new queue.  This is now reasonably accurate, and it closes up what would
have been a minimum of an order of magnitude error between the worst
case scenario's actual memory usage and accounted memory usage.

With this change in place, people that used to be able to allocate lots
of large queues of very small messages suddenly needed to adjust their
RLIMIT_MSGQUEUE to be able to continue.  I contend this is the right
thing, but it is a surprise to some people.  At the time, I had thought
that the sizeof struct msg_msg was already accounted for in the QSIZE
output.  So I had added the rbtree size in too so that users could see
their currently used memory more accurately.  Going back and looking
now, that was a mistake on my part as the size of struct msg_msg is not
included in that number, so it wasn't correct to add the rbtree size
their either (or at a minimum if I was going to add one, I should have
added both, but this in-between land makes no sense).  However, I think
it's probably worth adding a new field to the end of that data output
that does reflect both struct msg_msg and struct posix_msg_tree_node
allocations so that users can see the overhead of their current queue
usage, especially in light of the changes to how the rlimit is enforced.
And I would say that putting the data element back to an exact match to
the number of user data bytes currently in queue makes sense.

I've been trying to think of a way to tackle the priorities problem
anyway.  That we have a default, and unchangeable, setting of 32768
priorities precludes having lots of small messages in queue without
having to plan for huge amounts of overhead.  I think it's worth
investigating some method of allowing the supported number of priorities
for queues (either system wide or per namespace or per queue) to be
reduced in the name of efficiency.  I can bump that work up my priority
list and take care of fixing up the DATA field at the same time.

The man page below looks fine to me.  It covers the various
incarnations.  If I add some tweaks to the priorities value though, it
will need updating again ;-)

Although this section wasn't included below, I would update how the
memory is calculated to match what I wrote above.  However, I would also
put in a notation that the calculation can change when the kernel's
internal implementation changes and resource usage therefore changes.

> Cheers,
> 
> Michael
> 
> From mq_overview(7) draft:
> 
>    /proc interfaces
>        The following interfaces can be used to limit the amount of  ker‐
>        nel  memory  consumed  by  POSIX  message  queues  and to set the
>        default attributes for new message queues:
> 
>        /proc/sys/fs/mqueue/msg_default (since Linux 3.5)
>               This file  defines  the  value  used  for  a  new  queue's
>               mq_maxmsg setting when the queue is created with a call to
>               mq_open(3) where attr is specified as NULL.   The  default
>               value for this file is 10.  The minimum and maximum are as
>               for /proc/sys/fs/mqueue/msg_max.  If  msg_default  exceeds
>               msg_max,  a  new queue's default mq_maxmsg value is capped
>               to the msg_max limit.  Up until Linux 2.6.28, the  default
>               mq_maxmsg  was  10;  from  Linux  2.6.28 to Linux 3.4, the
>               default was the value defined for the msg_max limit.
> 
>        /proc/sys/fs/mqueue/msg_max
>               This file can be used to view and change the ceiling value
>               for the maximum number of messages in a queue.  This value
>               acts as a ceiling on the attr->mq_maxmsg argument given to
>               mq_open(3).   The  default  value  for msg_max is 10.  The
>               minimum value is 1 (10 in  kernels  before  2.6.28).   The
>               upper  limit is HARD_MSGMAX.  The msg_max limit is ignored
>               for  privileged  processes  (CAP_SYS_RESOURCE),  but   the
>               HARD_MSGMAX ceiling is nevertheless imposed.
> 
>               The  definition  of  HARD_MSGMAX has changed across kernel
>               versions:
> 
>               *  Up to Linux 2.6.32: 131072 / sizeof(void *)
> 
>               *  Linux 2.6.33 to 3.4: (32768 * sizeof(void *) / 4)
> 
>               *  Since Linux 3.5: 65,536
> 
>        /proc/sys/fs/mqueue/msgsize_default (since Linux 3.5)
>               This file defines the value used for a new queue's mq_msg‐
>               size  setting  when  the  queue  is created with a call to
>               mq_open(3) where attr is specified as NULL.   The  default
>               value  for this file is 8192.  The minimum and maximum are
>               as   for   /proc/sys/fs/mqueue/msgsize_max.     If    msg‐
>               size_default  exceeds  msgsize_max,  a new queue's default
>               mq_msgsize value is capped to the msgsize_max  limit.   Up
>               until  Linux 2.6.28, the default mq_msgsize was 8192; from
>               Linux 2.6.28 to Linux  3.4,  the  default  was  the  value
>               defined for the msgsize_max limit.
> 
>        /proc/sys/fs/mqueue/msgsize_max
>               This  file  can  be used to view and change the ceiling on
>               the maximum message size.  This value acts as a ceiling on
>               the  attr->mq_msgsize  argument  given to mq_open(3).  The
>               default value for msgsize_max is 8192 bytes.  The  minimum
>               value  is  128 (8192 in kernels before 2.6.28).  The upper
>               limit for msgsize_max has varied across kernel versions:
> 
>               *  Before Linux 2.6.28, the upper limit is INT_MAX.
> 
>               *  From Linux 2.6.28 to 3.4, the limit is 1,048,576.
> 
>               *  Since Linux 3.5, the limit is 16,777,216 (HARD_MSGSIZE‐
>                  MAX).
> 
>               The  msgsize_max  limit  is ignored for privileged process
>               (CAP_SYS_RESOURCE), but, since Linux  3.5,  the  HARD_MSG‐
>               SIZEMAX ceiling is enforced for privileged processes.
> 
>        /proc/sys/fs/mqueue/queues_max
>               This  file  can be used to view and change the system-wide
>               limit on the number of message queues that can be created.
>               The default value for queues_max is 256.  The semantics of
>               this limit have changed across kernel versions as follows:
> 
>               *  Before Linux 3.5, this limit could be  changed  to  any
>                  value  in  the  range 0 to INT_MAX, but privileged pro‐
>                  cesses (CAP_SYS_RESOURCE) can exceed the limit.
> 
>               *  Since Linux 3.5, there is a ceiling for this  limit  of
>                  1024     (HARD_QUEUESMAX).      Privileged    processes
>                  (CAP_SYS_RESOURCE) can exceed the queues_max limit, but
>                  the  HARD_QUEUESMAX  limit  is enforced even for privi‐
>                  leged processes.
> 
>               *  Starting with Linux 3.14, the HARD_QUEUESMAX ceiling is
>                  removed: no ceiling is imposed on the queues_max limit,
>                  and privileged processes (CAP_SYS_RESOURCE) can  exceed
>                  the limit.
> 


-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: 0E572FDD



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Document POSIX MQ /proc/sys/fs/mqueue files
  2014-09-29  9:10 Document POSIX MQ /proc/sys/fs/mqueue files Michael Kerrisk (man-pages)
  2014-09-29 17:28 ` Doug Ledford
@ 2014-09-29 20:23 ` Davidlohr Bueso
  2014-09-30 10:49   ` Michael Kerrisk (man-pages)
  1 sibling, 1 reply; 10+ messages in thread
From: Davidlohr Bueso @ 2014-09-29 20:23 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Doug Ledford, linux-man@vger.kernel.org, lkml, Madars Vitolins,
	Manfred Spraul

Hi Michael,

Cc'ing Manfred.

On Mon, 2014-09-29 at 11:10 +0200, Michael Kerrisk (man-pages) wrote:
> Hello Doug, David,
> 
> I think you two were the last ones to make significant 
> changes to the semantics of the files in /proc/sys/fs/mqueue,
> so I wonder if you (or anyone else who is willing) might
> take a look at the man page text below that I've written
> (for the mq_overview(7) page) to describe past and current
> reality, and let me know of improvements of corrections.

Over the years posix mqueues have increasingly become a mess *sigh* :-/
Thanks for doing this and untangling some of the historic changes.

> From mq_overview(7) draft:
> 
>    /proc interfaces
>        The following interfaces can be used to limit the amount of  ker‐
>        nel  memory  consumed  by  POSIX  message  queues  and to set the
>        default attributes for new message queues:
> 
>        /proc/sys/fs/mqueue/msg_default (since Linux 3.5)
>               This file  defines  the  value  used  for  a  new  queue's
>               mq_maxmsg setting when the queue is created with a call to
>               mq_open(3) where attr is specified as NULL.   The  default
>               value for this file is 10.  The minimum and maximum are as
>               for /proc/sys/fs/mqueue/msg_max.  If  msg_default  exceeds
>               msg_max,  a  new queue's default mq_maxmsg value is capped
>               to the msg_max limit. 

I think rephrasing this would read easier. Basically the behavior is
this:

		info->attr.mq_maxmsg = min(ipc_ns->mq_msg_max,
					   ipc_ns->mq_msg_default);

Something like:
"a new queue's default mq_maxmsg value will be the smallest of msg_default and msg_max"

>  Up until Linux 2.6.28, the  default
>               mq_maxmsg  was  10;  from  Linux  2.6.28 to Linux 3.4, the
>               default was the value defined for the msg_max limit.
> 
>        /proc/sys/fs/mqueue/msg_max
>               This file can be used to view and change the ceiling value
>               for the maximum number of messages in a queue.  This value
>               acts as a ceiling on the attr->mq_maxmsg argument given to
>               mq_open(3).   The  default  value  for msg_max is 10.  The
>               minimum value is 1 (10 in  kernels  before  2.6.28).   The
>               upper  limit is HARD_MSGMAX.  The msg_max limit is ignored
>               for  privileged  processes  (CAP_SYS_RESOURCE),  but   the
>               HARD_MSGMAX ceiling is nevertheless imposed.

Note that the HARD_MSGMAX check is done *only* for privileged processes,
regular processes only check against namespace values. This is a pretty
fundamental difference. The same goes of course for msgsize:

	if (capable(CAP_SYS_RESOURCE)) {
		if (attr->mq_maxmsg > HARD_MSGMAX ||
		    attr->mq_msgsize > HARD_MSGSIZEMAX)
			return -EINVAL;
	} else {
		if (attr->mq_maxmsg > ipc_ns->mq_msg_max ||
		    attr->mq_msgsize > ipc_ns->mq_msgsize_max)
			return -EINVAL;
	}


>               The  definition  of  HARD_MSGMAX has changed across kernel
>               versions:
> 
>               *  Up to Linux 2.6.32: 131072 / sizeof(void *)
> 
>               *  Linux 2.6.33 to 3.4: (32768 * sizeof(void *) / 4)
> 
>               *  Since Linux 3.5: 65,536
> 
>        /proc/sys/fs/mqueue/msgsize_default (since Linux 3.5)

You might want to mention the units (bytes) when refering to limits.

>               This file defines the value used for a new queue's mq_msg‐
>               size  setting  when  the  queue  is created with a call to
>               mq_open(3) where attr is specified as NULL.   The  default
>               value  for this file is 8192.  The minimum and maximum are
>               as   for   /proc/sys/fs/mqueue/msgsize_max.     If    msg‐
>               size_default  exceeds  msgsize_max,  a new queue's default
>               mq_msgsize value is capped to the msgsize_max  limit.   Up
>               until  Linux 2.6.28, the default mq_msgsize was 8192; from
>               Linux 2.6.28 to Linux  3.4,  the  default  was  the  value
>               defined for the msgsize_max limit.
> 
>        /proc/sys/fs/mqueue/msgsize_max

Ditto here.

>               This  file  can  be used to view and change the ceiling on
>               the maximum message size.  This value acts as a ceiling on
>               the  attr->mq_msgsize  argument  given to mq_open(3).  The
>               default value for msgsize_max is 8192 bytes.  The  minimum
>               value  is  128 (8192 in kernels before 2.6.28).  The upper
>               limit for msgsize_max has varied across kernel versions:
> 
>               *  Before Linux 2.6.28, the upper limit is INT_MAX.
> 
>               *  From Linux 2.6.28 to 3.4, the limit is 1,048,576.
> 
>               *  Since Linux 3.5, the limit is 16,777,216 (HARD_MSGSIZE‐
>                  MAX).
>               The  msgsize_max  limit  is ignored for privileged process
>               (CAP_SYS_RESOURCE), but, since Linux  3.5,  the  HARD_MSG‐
>               SIZEMAX ceiling is enforced for privileged processes.
> 
>        /proc/sys/fs/mqueue/queues_max
>               This  file  can be used to view and change the system-wide
>               limit on the number of message queues that can be created.
>               The default value for queues_max is 256.  The semantics of
>               this limit have changed across kernel versions as follows:
> 
>               *  Before Linux 3.5, this limit could be  changed  to  any
>                  value  in  the  range 0 to INT_MAX, but privileged pro‐
>                  cesses (CAP_SYS_RESOURCE) can exceed the limit.
> 
>               *  Since Linux 3.5, there is a ceiling for this  limit  of
>                  1024     (HARD_QUEUESMAX).      Privileged    processes
>                  (CAP_SYS_RESOURCE) can exceed the queues_max limit, but
>                  the  HARD_QUEUESMAX  limit  is enforced even for privi‐
>                  leged processes.
> 
>               *  Starting with Linux 3.14, the HARD_QUEUESMAX ceiling is
>                  removed: no ceiling is imposed on the queues_max limit,
>                  and privileged processes (CAP_SYS_RESOURCE) can  exceed
>                  the limit.

Given that this was treated as a bug that breaks user-space, I don't
think we really want to document the behavior between 3.5 and 3.14 (all
three bullets). Stable kernels back to 3.5 now have the default
behavior, so its as nothing ever changed(?). Now, if you explicitly want
to document such bug, I would agree, but just not mentioning it as
intentional differences in behavior. Does that make sense?

Thanks,
Davidlohr



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Document POSIX MQ /proc/sys/fs/mqueue files
  2014-09-29 17:28 ` Doug Ledford
@ 2014-09-30 10:12   ` Michael Kerrisk (man-pages)
  2014-09-30 17:30     ` Davidlohr Bueso
  2014-09-30 19:57     ` Doug Ledford
  2014-10-01 10:02   ` Michael Kerrisk (man-pages)
  1 sibling, 2 replies; 10+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-09-30 10:12 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Davidlohr Bueso, linux-man@vger.kernel.org, lkml, Madars Vitolins

Hi Doug,

On Mon, Sep 29, 2014 at 7:28 PM, Doug Ledford <dledford@redhat.com> wrote:
> On Mon, 2014-09-29 at 11:10 +0200, Michael Kerrisk (man-pages) wrote:
>> Hello Doug, David,
>>
>> I think you two were the last ones to make significant
>> changes to the semantics of the files in /proc/sys/fs/mqueue,
>> so I wonder if you (or anyone else who is willing) might
>> take a look at the man page text below that I've written
>> (for the mq_overview(7) page) to describe past and current
>> reality, and let me know of improvements of corrections.
>>
>> By the way, Doug, your commit ce2d52cc1364 appears to have
>> changed/broken the semantics of the files in the /dev/mqueue
>> filesystem. Formerly, the QSIZE field in these files showed
>> the number of bytes of real user data in all of the queued
>> messages. After that commit, QSIZE now includes kernel
>> overhead bytes, which does not seem very useful for user
>> space. Was that change intentional? I see no mention of the
>> change in the commit message, so it sounds like it was not
>> intended.
>
> That change didn't come in that commit.  That commit modified it, but
> didn't introduce it.

(Which commit was it then? d6629859b36 ?)

> Now, was it intentional? Yes.  Is it valuable, useful?  That depends on
> your perspective.

Thanks for the detailed explanation below. However, I don't understand
why the (useful) work that you describe below necessitated a change in
the QSIZE value that was exposed to user space. Surely the necessary
changes could have been done internally while still leaving QSIZE to
expose the same value it ever did? As things stand now (and unless I
am missing something), QSIZE exposes an implementation-specific
internal value that has little meaning or value to user space. And,
it's unfortunate that the commit message made no mention of the fact
that there was an ABI change here.

[...]

> The man page below looks fine to me.

Thanks for checking it!

Cheers,

Michael


> It covers the various
> incarnations.  If I add some tweaks to the priorities value though, it
> will need updating again ;-)
>
> Although this section wasn't included below, I would update how the
> memory is calculated to match what I wrote above.  However, I would also
> put in a notation that the calculation can change when the kernel's
> internal implementation changes and resource usage therefore changes.
>
>> Cheers,
>>
>> Michael
>>
>> From mq_overview(7) draft:
>>
>>    /proc interfaces
>>        The following interfaces can be used to limit the amount of  ker‐
>>        nel  memory  consumed  by  POSIX  message  queues  and to set the
>>        default attributes for new message queues:
>>
>>        /proc/sys/fs/mqueue/msg_default (since Linux 3.5)
>>               This file  defines  the  value  used  for  a  new  queue's
>>               mq_maxmsg setting when the queue is created with a call to
>>               mq_open(3) where attr is specified as NULL.   The  default
>>               value for this file is 10.  The minimum and maximum are as
>>               for /proc/sys/fs/mqueue/msg_max.  If  msg_default  exceeds
>>               msg_max,  a  new queue's default mq_maxmsg value is capped
>>               to the msg_max limit.  Up until Linux 2.6.28, the  default
>>               mq_maxmsg  was  10;  from  Linux  2.6.28 to Linux 3.4, the
>>               default was the value defined for the msg_max limit.
>>
>>        /proc/sys/fs/mqueue/msg_max
>>               This file can be used to view and change the ceiling value
>>               for the maximum number of messages in a queue.  This value
>>               acts as a ceiling on the attr->mq_maxmsg argument given to
>>               mq_open(3).   The  default  value  for msg_max is 10.  The
>>               minimum value is 1 (10 in  kernels  before  2.6.28).   The
>>               upper  limit is HARD_MSGMAX.  The msg_max limit is ignored
>>               for  privileged  processes  (CAP_SYS_RESOURCE),  but   the
>>               HARD_MSGMAX ceiling is nevertheless imposed.
>>
>>               The  definition  of  HARD_MSGMAX has changed across kernel
>>               versions:
>>
>>               *  Up to Linux 2.6.32: 131072 / sizeof(void *)
>>
>>               *  Linux 2.6.33 to 3.4: (32768 * sizeof(void *) / 4)
>>
>>               *  Since Linux 3.5: 65,536
>>
>>        /proc/sys/fs/mqueue/msgsize_default (since Linux 3.5)
>>               This file defines the value used for a new queue's mq_msg‐
>>               size  setting  when  the  queue  is created with a call to
>>               mq_open(3) where attr is specified as NULL.   The  default
>>               value  for this file is 8192.  The minimum and maximum are
>>               as   for   /proc/sys/fs/mqueue/msgsize_max.     If    msg‐
>>               size_default  exceeds  msgsize_max,  a new queue's default
>>               mq_msgsize value is capped to the msgsize_max  limit.   Up
>>               until  Linux 2.6.28, the default mq_msgsize was 8192; from
>>               Linux 2.6.28 to Linux  3.4,  the  default  was  the  value
>>               defined for the msgsize_max limit.
>>
>>        /proc/sys/fs/mqueue/msgsize_max
>>               This  file  can  be used to view and change the ceiling on
>>               the maximum message size.  This value acts as a ceiling on
>>               the  attr->mq_msgsize  argument  given to mq_open(3).  The
>>               default value for msgsize_max is 8192 bytes.  The  minimum
>>               value  is  128 (8192 in kernels before 2.6.28).  The upper
>>               limit for msgsize_max has varied across kernel versions:
>>
>>               *  Before Linux 2.6.28, the upper limit is INT_MAX.
>>
>>               *  From Linux 2.6.28 to 3.4, the limit is 1,048,576.
>>
>>               *  Since Linux 3.5, the limit is 16,777,216 (HARD_MSGSIZE‐
>>                  MAX).
>>
>>               The  msgsize_max  limit  is ignored for privileged process
>>               (CAP_SYS_RESOURCE), but, since Linux  3.5,  the  HARD_MSG‐
>>               SIZEMAX ceiling is enforced for privileged processes.
>>
>>        /proc/sys/fs/mqueue/queues_max
>>               This  file  can be used to view and change the system-wide
>>               limit on the number of message queues that can be created.
>>               The default value for queues_max is 256.  The semantics of
>>               this limit have changed across kernel versions as follows:
>>
>>               *  Before Linux 3.5, this limit could be  changed  to  any
>>                  value  in  the  range 0 to INT_MAX, but privileged pro‐
>>                  cesses (CAP_SYS_RESOURCE) can exceed the limit.
>>
>>               *  Since Linux 3.5, there is a ceiling for this  limit  of
>>                  1024     (HARD_QUEUESMAX).      Privileged    processes
>>                  (CAP_SYS_RESOURCE) can exceed the queues_max limit, but
>>                  the  HARD_QUEUESMAX  limit  is enforced even for privi‐
>>                  leged processes.
>>
>>               *  Starting with Linux 3.14, the HARD_QUEUESMAX ceiling is
>>                  removed: no ceiling is imposed on the queues_max limit,
>>                  and privileged processes (CAP_SYS_RESOURCE) can  exceed
>>                  the limit.
>>
>
>
> --
> Doug Ledford <dledford@redhat.com>
>               GPG KeyID: 0E572FDD
>
>



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Document POSIX MQ /proc/sys/fs/mqueue files
  2014-09-29 20:23 ` Davidlohr Bueso
@ 2014-09-30 10:49   ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 10+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-09-30 10:49 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: mtk.manpages, Doug Ledford, linux-man@vger.kernel.org, lkml,
	Madars Vitolins, Manfred Spraul

Hi David,

On 09/29/2014 10:23 PM, Davidlohr Bueso wrote:
> Hi Michael,
> 
> Cc'ing Manfred.

(Thanks, I should have thought of that.)

> On Mon, 2014-09-29 at 11:10 +0200, Michael Kerrisk (man-pages) wrote:
>> Hello Doug, David,
>>
>> I think you two were the last ones to make significant 
>> changes to the semantics of the files in /proc/sys/fs/mqueue,
>> so I wonder if you (or anyone else who is willing) might
>> take a look at the man page text below that I've written
>> (for the mq_overview(7) page) to describe past and current
>> reality, and let me know of improvements of corrections.
> 
> Over the years posix mqueues have increasingly become a mess *sigh* :-/
> Thanks for doing this and untangling some of the historic changes.
> 
>> From mq_overview(7) draft:
>>
>>    /proc interfaces
>>        The following interfaces can be used to limit the amount of  ker‐
>>        nel  memory  consumed  by  POSIX  message  queues  and to set the
>>        default attributes for new message queues:
>>
>>        /proc/sys/fs/mqueue/msg_default (since Linux 3.5)
>>               This file  defines  the  value  used  for  a  new  queue's
>>               mq_maxmsg setting when the queue is created with a call to
>>               mq_open(3) where attr is specified as NULL.   The  default
>>               value for this file is 10.  The minimum and maximum are as
>>               for /proc/sys/fs/mqueue/msg_max.  If  msg_default  exceeds
>>               msg_max,  a  new queue's default mq_maxmsg value is capped
>>               to the msg_max limit. 
> 
> I think rephrasing this would read easier. Basically the behavior is
> this:
> 
> 		info->attr.mq_maxmsg = min(ipc_ns->mq_msg_max,
> 					   ipc_ns->mq_msg_default);
> 
> Something like:
> "a new queue's default mq_maxmsg value will be the smallest of msg_default and msg_max"

Yes, better. Changed. Thanks.
> 
>>  Up until Linux 2.6.28, the  default
>>               mq_maxmsg  was  10;  from  Linux  2.6.28 to Linux 3.4, the
>>               default was the value defined for the msg_max limit.
>>
>>        /proc/sys/fs/mqueue/msg_max
>>               This file can be used to view and change the ceiling value
>>               for the maximum number of messages in a queue.  This value
>>               acts as a ceiling on the attr->mq_maxmsg argument given to
>>               mq_open(3).   The  default  value  for msg_max is 10.  The
>>               minimum value is 1 (10 in  kernels  before  2.6.28).   The
>>               upper  limit is HARD_MSGMAX.  The msg_max limit is ignored
>>               for  privileged  processes  (CAP_SYS_RESOURCE),  but   the
>>               HARD_MSGMAX ceiling is nevertheless imposed.
> 
> Note that the HARD_MSGMAX check is done *only* for privileged processes,
> regular processes only check against namespace values. This is a pretty
> fundamental difference. The same goes of course for msgsize:

Yes, I understand. But, the existing text still seems okay to me. The
thing is that HARD_MSGMAX is still in effect a limit for unprivileged 
processes also, since it is a ceiling on 'msg_max'. See what I mean?

> 	if (capable(CAP_SYS_RESOURCE)) {
> 		if (attr->mq_maxmsg > HARD_MSGMAX ||
> 		    attr->mq_msgsize > HARD_MSGSIZEMAX)
> 			return -EINVAL;
> 	} else {
> 		if (attr->mq_maxmsg > ipc_ns->mq_msg_max ||
> 		    attr->mq_msgsize > ipc_ns->mq_msgsize_max)
> 			return -EINVAL;
> 	}
> 
> 
>>               The  definition  of  HARD_MSGMAX has changed across kernel
>>               versions:
>>
>>               *  Up to Linux 2.6.32: 131072 / sizeof(void *)
>>
>>               *  Linux 2.6.33 to 3.4: (32768 * sizeof(void *) / 4)
>>
>>               *  Since Linux 3.5: 65,536
>>
>>        /proc/sys/fs/mqueue/msgsize_default (since Linux 3.5)
> 
> You might want to mention the units (bytes) when refering to limits.

I added the words "bytes" in the text near here.
 
>>               This file defines the value used for a new queue's mq_msg‐
>>               size  setting  when  the  queue  is created with a call to
>>               mq_open(3) where attr is specified as NULL.   The  default
>>               value  for this file is 8192.  The minimum and maximum are
>>               as   for   /proc/sys/fs/mqueue/msgsize_max.     If    msg‐
>>               size_default  exceeds  msgsize_max,  a new queue's default
>>               mq_msgsize value is capped to the msgsize_max  limit.   Up
>>               until  Linux 2.6.28, the default mq_msgsize was 8192; from
>>               Linux 2.6.28 to Linux  3.4,  the  default  was  the  value
>>               defined for the msgsize_max limit.
>>
>>        /proc/sys/fs/mqueue/msgsize_max
> 
> Ditto here.

("bytes" does already get mentioned below.)

>>               This  file  can  be used to view and change the ceiling on
>>               the maximum message size.  This value acts as a ceiling on
>>               the  attr->mq_msgsize  argument  given to mq_open(3).  The
>>               default value for msgsize_max is 8192 bytes.  The  minimum
>>               value  is  128 (8192 in kernels before 2.6.28).  The upper
>>               limit for msgsize_max has varied across kernel versions:
>>
>>               *  Before Linux 2.6.28, the upper limit is INT_MAX.
>>
>>               *  From Linux 2.6.28 to 3.4, the limit is 1,048,576.
>>
>>               *  Since Linux 3.5, the limit is 16,777,216 (HARD_MSGSIZE‐
>>                  MAX).
>>               The  msgsize_max  limit  is ignored for privileged process
>>               (CAP_SYS_RESOURCE), but, since Linux  3.5,  the  HARD_MSG‐
>>               SIZEMAX ceiling is enforced for privileged processes.
>>
>>        /proc/sys/fs/mqueue/queues_max
>>               This  file  can be used to view and change the system-wide
>>               limit on the number of message queues that can be created.
>>               The default value for queues_max is 256.  The semantics of
>>               this limit have changed across kernel versions as follows:
>>
>>               *  Before Linux 3.5, this limit could be  changed  to  any
>>                  value  in  the  range 0 to INT_MAX, but privileged pro‐
>>                  cesses (CAP_SYS_RESOURCE) can exceed the limit.
>>
>>               *  Since Linux 3.5, there is a ceiling for this  limit  of
>>                  1024     (HARD_QUEUESMAX).      Privileged    processes
>>                  (CAP_SYS_RESOURCE) can exceed the queues_max limit, but
>>                  the  HARD_QUEUESMAX  limit  is enforced even for privi‐
>>                  leged processes.
>>
>>               *  Starting with Linux 3.14, the HARD_QUEUESMAX ceiling is
>>                  removed: no ceiling is imposed on the queues_max limit,
>>                  and privileged processes (CAP_SYS_RESOURCE) can  exceed
>>                  the limit.
> 
> Given that this was treated as a bug that breaks user-space, I don't
> think we really want to document the behavior between 3.5 and 3.14 (all
> three bullets). Stable kernels back to 3.5 now have the default
> behavior, so its as nothing ever changed(?). Now, if you explicitly want
> to document such bug, I would agree, but just not mentioning it as
> intentional differences in behavior. Does that make sense?

Yes. I've simplified that piece to just:

       /proc/sys/fs/mqueue/queues_max
              This file can be used to view and change  the  system-wide
              limit on the number of message queues that can be created.
              The default value for queues_max is 256.   No  ceiling  is
              imposed  on  the  queues_max  limit;  privileged processes
              (CAP_SYS_RESOURCE) can exceed the limit (but see BUGS).

plus:

    BUGS
       In  Linux  versions  3.5 to 3.14, the kernel imposed a ceiling of
       1024 (HARD_QUEUESMAX) on the value to which the queues_max  limit
       could be raised, and the ceiling was enforced even for privileged
       processes.  This ceiling value was removed  in  Linux  3.14,  and
       patches  to stable kernels 3.5.x to 3.13.x also removed the ceil‐
       ing.

Okay?

Thanks for the careful review, David.

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Document POSIX MQ /proc/sys/fs/mqueue files
  2014-09-30 10:12   ` Michael Kerrisk (man-pages)
@ 2014-09-30 17:30     ` Davidlohr Bueso
  2014-09-30 17:42       ` Davidlohr Bueso
  2014-09-30 19:57     ` Doug Ledford
  1 sibling, 1 reply; 10+ messages in thread
From: Davidlohr Bueso @ 2014-09-30 17:30 UTC (permalink / raw)
  To: mtk.manpages
  Cc: Doug Ledford, linux-man@vger.kernel.org, lkml, Madars Vitolins,
	Manfred Spraul, Andrew Morton

On Tue, 2014-09-30 at 12:12 +0200, Michael Kerrisk (man-pages) wrote:
> Hi Doug,
> 
> On Mon, Sep 29, 2014 at 7:28 PM, Doug Ledford <dledford@redhat.com> wrote:
> > On Mon, 2014-09-29 at 11:10 +0200, Michael Kerrisk (man-pages) wrote:
> >> Hello Doug, David,
> >>
> >> I think you two were the last ones to make significant
> >> changes to the semantics of the files in /proc/sys/fs/mqueue,
> >> so I wonder if you (or anyone else who is willing) might
> >> take a look at the man page text below that I've written
> >> (for the mq_overview(7) page) to describe past and current
> >> reality, and let me know of improvements of corrections.
> >>
> >> By the way, Doug, your commit ce2d52cc1364 appears to have
> >> changed/broken the semantics of the files in the /dev/mqueue
> >> filesystem. Formerly, the QSIZE field in these files showed
> >> the number of bytes of real user data in all of the queued
> >> messages. After that commit, QSIZE now includes kernel
> >> overhead bytes, which does not seem very useful for user
> >> space. Was that change intentional? I see no mention of the
> >> change in the commit message, so it sounds like it was not
> >> intended.
> >
> > That change didn't come in that commit.  That commit modified it, but
> > didn't introduce it.
> 
> (Which commit was it then? d6629859b36 ?)

By just looking at msg_insert and msg_get, I think so, yeah.

> 
> > Now, was it intentional? Yes.  Is it valuable, useful?  That depends on
> > your perspective.
> 
> Thanks for the detailed explanation below. However, I don't understand
> why the (useful) work that you describe below necessitated a change in
> the QSIZE value that was exposed to user space. Surely the necessary
> changes could have been done internally while still leaving QSIZE to
> expose the same value it ever did? As things stand now (and unless I
> am missing something), QSIZE exposes an implementation-specific
> internal value that has little meaning or value to user space. And,
> it's unfortunate that the commit message made no mention of the fact
> that there was an ABI change here.

Agreed. And this needs to be changed back -- *although* there have been
0 bug reports afaict. Probably similarly to what we did with the
queues_max issue: stable since v3.5. Doug, any thoughts?

Thanks,
Davidlohr


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Document POSIX MQ /proc/sys/fs/mqueue files
  2014-09-30 17:30     ` Davidlohr Bueso
@ 2014-09-30 17:42       ` Davidlohr Bueso
  0 siblings, 0 replies; 10+ messages in thread
From: Davidlohr Bueso @ 2014-09-30 17:42 UTC (permalink / raw)
  To: mtk.manpages
  Cc: Doug Ledford, linux-man@vger.kernel.org, lkml, Madars Vitolins,
	Manfred Spraul, Andrew Morton

On Tue, 2014-09-30 at 10:30 -0700, Davidlohr Bueso wrote:
> Agreed. And this needs to be changed back -- *although* there have been
> 0 bug reports afaict. Probably similarly to what we did with the
> queues_max issue: stable since v3.5. Doug, any thoughts?

Note that by changing back, I don't mean reverting your patches, just
not exporting the extra bits to QSIZE.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Document POSIX MQ /proc/sys/fs/mqueue files
  2014-09-30 10:12   ` Michael Kerrisk (man-pages)
  2014-09-30 17:30     ` Davidlohr Bueso
@ 2014-09-30 19:57     ` Doug Ledford
  2014-10-01  8:19       ` Michael Kerrisk (man-pages)
  1 sibling, 1 reply; 10+ messages in thread
From: Doug Ledford @ 2014-09-30 19:57 UTC (permalink / raw)
  To: mtk.manpages
  Cc: Davidlohr Bueso, linux-man@vger.kernel.org, lkml, Madars Vitolins

[-- Attachment #1: Type: text/plain, Size: 9812 bytes --]

On Tue, 2014-09-30 at 12:12 +0200, Michael Kerrisk (man-pages) wrote:
> Hi Doug,
> 
> On Mon, Sep 29, 2014 at 7:28 PM, Doug Ledford <dledford@redhat.com> wrote:
> > On Mon, 2014-09-29 at 11:10 +0200, Michael Kerrisk (man-pages) wrote:
> >> Hello Doug, David,
> >>
> >> I think you two were the last ones to make significant
> >> changes to the semantics of the files in /proc/sys/fs/mqueue,
> >> so I wonder if you (or anyone else who is willing) might
> >> take a look at the man page text below that I've written
> >> (for the mq_overview(7) page) to describe past and current
> >> reality, and let me know of improvements of corrections.
> >>
> >> By the way, Doug, your commit ce2d52cc1364 appears to have
> >> changed/broken the semantics of the files in the /dev/mqueue
> >> filesystem. Formerly, the QSIZE field in these files showed
> >> the number of bytes of real user data in all of the queued
> >> messages. After that commit, QSIZE now includes kernel
> >> overhead bytes, which does not seem very useful for user
> >> space. Was that change intentional? I see no mention of the
> >> change in the commit message, so it sounds like it was not
> >> intended.
> >
> > That change didn't come in that commit.  That commit modified it, but
> > didn't introduce it.
> 
> (Which commit was it then? d6629859b36 ?)

Yes, that's the one.

> > Now, was it intentional? Yes.  Is it valuable, useful?  That depends on
> > your perspective.
> 
> Thanks for the detailed explanation below. However, I don't understand
> why the (useful) work that you describe below necessitated a change in
> the QSIZE value that was exposed to user space.

Given how long ago this was, I can't say for sure, old age and memory
being what it is ;-)  Most likely, when I rewrote the msg_insert
routine, I saw we were updating info->qsize and said to myself "Crap,
I've added a new structure, we have to account for it too" and made the
change.

>  Surely the necessary
> changes could have been done internally while still leaving QSIZE to
> expose the same value it ever did?

Yes, it could have.

>  As things stand now (and unless I
> am missing something), QSIZE exposes an implementation-specific
> internal value that has little meaning or value to user space.

This part is not necessarily true.  I'm pretty sure at the time I
thought the struct msg_msg was also included in qsize (even though it
isn't).  And although we've not had any reports of bugs on this, I have
a Red Hat bug against the accounting change (namely that it caught one
user off guard that they needed to increase their RLIMIT_MSGQUEUE to
create the same number/size of queues they used to be able to create)
and so it does have some value in that it's the only way a user has of
knowing just how much the overhead of their queue is biting them in the
ass in terms of that RLIMIT_MSGQUEUE test.  But, since it doesn't
include the size of each struct msg_msg, it's incomplete even for that
purpose.  Like I said in my previous email, I'm not so sure it wouldn't
be wise to include some extra data in this file (but that again would be
an ABI break).  Maybe a second line that includes something like this:

CUR_OVERHEAD: # RLIM_OVERHEAD: # RLIM_PAYLOAD: #

where CUR_OVERHEAD is how much we currently have allocated in internal
kernel structures for the current DATA on the line above, and the other
two are the amount of size we charged against the RLIMIT_MSGQUEUE
available to the user based upon their queue parameters and the
potential worst case scenario of queue usage.

>  And,
> it's unfortunate that the commit message made no mention of the fact
> that there was an ABI change here.

I don't think I realized it was an ABI change at the time.

> [...]
> 
> > The man page below looks fine to me.
> 
> Thanks for checking it!
> 
> Cheers,
> 
> Michael
> 
> 
> > It covers the various
> > incarnations.  If I add some tweaks to the priorities value though, it
> > will need updating again ;-)
> >
> > Although this section wasn't included below, I would update how the
> > memory is calculated to match what I wrote above.  However, I would also
> > put in a notation that the calculation can change when the kernel's
> > internal implementation changes and resource usage therefore changes.
> >
> >> Cheers,
> >>
> >> Michael
> >>
> >> From mq_overview(7) draft:
> >>
> >>    /proc interfaces
> >>        The following interfaces can be used to limit the amount of  ker‐
> >>        nel  memory  consumed  by  POSIX  message  queues  and to set the
> >>        default attributes for new message queues:
> >>
> >>        /proc/sys/fs/mqueue/msg_default (since Linux 3.5)
> >>               This file  defines  the  value  used  for  a  new  queue's
> >>               mq_maxmsg setting when the queue is created with a call to
> >>               mq_open(3) where attr is specified as NULL.   The  default
> >>               value for this file is 10.  The minimum and maximum are as
> >>               for /proc/sys/fs/mqueue/msg_max.  If  msg_default  exceeds
> >>               msg_max,  a  new queue's default mq_maxmsg value is capped
> >>               to the msg_max limit.  Up until Linux 2.6.28, the  default
> >>               mq_maxmsg  was  10;  from  Linux  2.6.28 to Linux 3.4, the
> >>               default was the value defined for the msg_max limit.
> >>
> >>        /proc/sys/fs/mqueue/msg_max
> >>               This file can be used to view and change the ceiling value
> >>               for the maximum number of messages in a queue.  This value
> >>               acts as a ceiling on the attr->mq_maxmsg argument given to
> >>               mq_open(3).   The  default  value  for msg_max is 10.  The
> >>               minimum value is 1 (10 in  kernels  before  2.6.28).   The
> >>               upper  limit is HARD_MSGMAX.  The msg_max limit is ignored
> >>               for  privileged  processes  (CAP_SYS_RESOURCE),  but   the
> >>               HARD_MSGMAX ceiling is nevertheless imposed.
> >>
> >>               The  definition  of  HARD_MSGMAX has changed across kernel
> >>               versions:
> >>
> >>               *  Up to Linux 2.6.32: 131072 / sizeof(void *)
> >>
> >>               *  Linux 2.6.33 to 3.4: (32768 * sizeof(void *) / 4)
> >>
> >>               *  Since Linux 3.5: 65,536
> >>
> >>        /proc/sys/fs/mqueue/msgsize_default (since Linux 3.5)
> >>               This file defines the value used for a new queue's mq_msg‐
> >>               size  setting  when  the  queue  is created with a call to
> >>               mq_open(3) where attr is specified as NULL.   The  default
> >>               value  for this file is 8192.  The minimum and maximum are
> >>               as   for   /proc/sys/fs/mqueue/msgsize_max.     If    msg‐
> >>               size_default  exceeds  msgsize_max,  a new queue's default
> >>               mq_msgsize value is capped to the msgsize_max  limit.   Up
> >>               until  Linux 2.6.28, the default mq_msgsize was 8192; from
> >>               Linux 2.6.28 to Linux  3.4,  the  default  was  the  value
> >>               defined for the msgsize_max limit.
> >>
> >>        /proc/sys/fs/mqueue/msgsize_max
> >>               This  file  can  be used to view and change the ceiling on
> >>               the maximum message size.  This value acts as a ceiling on
> >>               the  attr->mq_msgsize  argument  given to mq_open(3).  The
> >>               default value for msgsize_max is 8192 bytes.  The  minimum
> >>               value  is  128 (8192 in kernels before 2.6.28).  The upper
> >>               limit for msgsize_max has varied across kernel versions:
> >>
> >>               *  Before Linux 2.6.28, the upper limit is INT_MAX.
> >>
> >>               *  From Linux 2.6.28 to 3.4, the limit is 1,048,576.
> >>
> >>               *  Since Linux 3.5, the limit is 16,777,216 (HARD_MSGSIZE‐
> >>                  MAX).
> >>
> >>               The  msgsize_max  limit  is ignored for privileged process
> >>               (CAP_SYS_RESOURCE), but, since Linux  3.5,  the  HARD_MSG‐
> >>               SIZEMAX ceiling is enforced for privileged processes.
> >>
> >>        /proc/sys/fs/mqueue/queues_max
> >>               This  file  can be used to view and change the system-wide
> >>               limit on the number of message queues that can be created.
> >>               The default value for queues_max is 256.  The semantics of
> >>               this limit have changed across kernel versions as follows:
> >>
> >>               *  Before Linux 3.5, this limit could be  changed  to  any
> >>                  value  in  the  range 0 to INT_MAX, but privileged pro‐
> >>                  cesses (CAP_SYS_RESOURCE) can exceed the limit.
> >>
> >>               *  Since Linux 3.5, there is a ceiling for this  limit  of
> >>                  1024     (HARD_QUEUESMAX).      Privileged    processes
> >>                  (CAP_SYS_RESOURCE) can exceed the queues_max limit, but
> >>                  the  HARD_QUEUESMAX  limit  is enforced even for privi‐
> >>                  leged processes.
> >>
> >>               *  Starting with Linux 3.14, the HARD_QUEUESMAX ceiling is
> >>                  removed: no ceiling is imposed on the queues_max limit,
> >>                  and privileged processes (CAP_SYS_RESOURCE) can  exceed
> >>                  the limit.
> >>
> >
> >
> > --
> > Doug Ledford <dledford@redhat.com>
> >               GPG KeyID: 0E572FDD
> >
> >
> 
> 
> 


-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: 0E572FDD



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Document POSIX MQ /proc/sys/fs/mqueue files
  2014-09-30 19:57     ` Doug Ledford
@ 2014-10-01  8:19       ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 10+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-10-01  8:19 UTC (permalink / raw)
  To: Doug Ledford
  Cc: mtk.manpages, Davidlohr Bueso, linux-man@vger.kernel.org, lkml,
	Madars Vitolins

Hi Doug,

On 09/30/2014 09:57 PM, Doug Ledford wrote:
> On Tue, 2014-09-30 at 12:12 +0200, Michael Kerrisk (man-pages) wrote:
>> Hi Doug,
>>
>> On Mon, Sep 29, 2014 at 7:28 PM, Doug Ledford <dledford@redhat.com> wrote:
>>> On Mon, 2014-09-29 at 11:10 +0200, Michael Kerrisk (man-pages) wrote:
>>>> Hello Doug, David,
>>>>
>>>> I think you two were the last ones to make significant
>>>> changes to the semantics of the files in /proc/sys/fs/mqueue,
>>>> so I wonder if you (or anyone else who is willing) might
>>>> take a look at the man page text below that I've written
>>>> (for the mq_overview(7) page) to describe past and current
>>>> reality, and let me know of improvements of corrections.
>>>>
>>>> By the way, Doug, your commit ce2d52cc1364 appears to have
>>>> changed/broken the semantics of the files in the /dev/mqueue
>>>> filesystem. Formerly, the QSIZE field in these files showed
>>>> the number of bytes of real user data in all of the queued
>>>> messages. After that commit, QSIZE now includes kernel
>>>> overhead bytes, which does not seem very useful for user
>>>> space. Was that change intentional? I see no mention of the
>>>> change in the commit message, so it sounds like it was not
>>>> intended.
>>>
>>> That change didn't come in that commit.  That commit modified it, but
>>> didn't introduce it.
>>
>> (Which commit was it then? d6629859b36 ?)
> 
> Yes, that's the one.
> 
>>> Now, was it intentional? Yes.  Is it valuable, useful?  That depends on
>>> your perspective.
>>
>> Thanks for the detailed explanation below. However, I don't understand
>> why the (useful) work that you describe below necessitated a change in
>> the QSIZE value that was exposed to user space.
> 
> Given how long ago this was, I can't say for sure, old age and memory
> being what it is ;-)  Most likely, when I rewrote the msg_insert
> routine, I saw we were updating info->qsize and said to myself "Crap,
> I've added a new structure, we have to account for it too" and made the
> change.
> 
>>  Surely the necessary
>> changes could have been done internally while still leaving QSIZE to
>> expose the same value it ever did?
> 
> Yes, it could have.
> 
>>  As things stand now (and unless I
>> am missing something), QSIZE exposes an implementation-specific
>> internal value that has little meaning or value to user space.
> 
> This part is not necessarily true.  I'm pretty sure at the time I
> thought the struct msg_msg was also included in qsize (even though it
> isn't).  And although we've not had any reports of bugs on this, I have
> a Red Hat bug against the accounting change (namely that it caught one
> user off guard that they needed to increase their RLIMIT_MSGQUEUE to
> create the same number/size of queues they used to be able to create)
> and so it does have some value in that it's the only way a user has of
> knowing just how much the overhead of their queue is biting them in the
> ass in terms of that RLIMIT_MSGQUEUE test.  But, since it doesn't
> include the size of each struct msg_msg, it's incomplete even for that
> purpose.  Like I said in my previous email, I'm not so sure it wouldn't
> be wise to include some extra data in this file (but that again would be
> an ABI break).  Maybe a second line that includes something like this:
> 
> CUR_OVERHEAD: # RLIM_OVERHEAD: # RLIM_PAYLOAD: #
> 
> where CUR_OVERHEAD is how much we currently have allocated in internal
> kernel structures for the current DATA on the line above, and the other
> two are the amount of size we charged against the RLIMIT_MSGQUEUE
> available to the user based upon their queue parameters and the
> potential worst case scenario of queue usage.
> 
>>  And,
>> it's unfortunate that the commit message made no mention of the fact
>> that there was an ABI change here.
> 
> I don't think I realized it was an ABI change at the time.

So, to summarize:

* QSIZE returning a count of the user data bytes in the queue was
  the actual (and intended and documented) behavior from Linux 
  2.6.6 to 3.4.
* Linux 3.5 changed the value exposed by QSIZE to something
  that more closely matches the amount of memory
  consumed by the kernel implementation. However:

     -- That change broke the ABI.
     -- The newly exposed value still doesn't match the
        consumed memory as accounted against RLIMIT_MSGQUEUE,
        so it's still not really useful.

* No-one complained about the QSIZE ABI change yet (well, except 
  me), but that doesn't mean no-one has been bitten. After
  all, it took a while before reports about the HARD_QUEUESMAX
  breakage to filter through.

I think QSIZE really should be fixed to expose the same value
it used to expose, which is the real number of user data bytes
in the queue. I'm agnostic on whether or not further fields
along the lines you suggest should be added to the /dev/mqueue 
files. In my opinion, that's an ABI extension, but not a breakage:
those files have been designed for easy parsing with fields of
the form "name:value", and properly designed applications
won't be tripped up by extensions to the format.

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Document POSIX MQ /proc/sys/fs/mqueue files
  2014-09-29 17:28 ` Doug Ledford
  2014-09-30 10:12   ` Michael Kerrisk (man-pages)
@ 2014-10-01 10:02   ` Michael Kerrisk (man-pages)
  1 sibling, 0 replies; 10+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-10-01 10:02 UTC (permalink / raw)
  To: Doug Ledford
  Cc: mtk.manpages, Davidlohr Bueso, linux-man@vger.kernel.org, lkml,
	Madars Vitolins

On 09/29/2014 07:28 PM, Doug Ledford wrote:
> On Mon, 2014-09-29 at 11:10 +0200, Michael Kerrisk (man-pages) wrote:
>> Hello Doug, David,
>>
>> I think you two were the last ones to make significant 
>> changes to the semantics of the files in /proc/sys/fs/mqueue,
>> so I wonder if you (or anyone else who is willing) might
>> take a look at the man page text below that I've written
>> (for the mq_overview(7) page) to describe past and current
>> reality, and let me know of improvements of corrections.
>>
>> By the way, Doug, your commit ce2d52cc1364 appears to have
>> changed/broken the semantics of the files in the /dev/mqueue 
>> filesystem. Formerly, the QSIZE field in these files showed
>> the number of bytes of real user data in all of the queued
>> messages. After that commit, QSIZE now includes kernel 
>> overhead bytes, which does not seem very useful for user 
>> space. Was that change intentional? I see no mention of the
>> change in the commit message, so it sounds like it was not 
>> intended.
> 
> That change didn't come in that commit.  That commit modified it, but
> didn't introduce it.
> 
> Now, was it intentional? Yes.  Is it valuable, useful?  That depends on
> your perspective.
> 
> One of the problems I ran into with that code relates to the rlimit
> checks that happen at queue creation time.  We used to check to see if
> 
>  msg_num * (msg_size + sizeof struct msg_msg *)
> 
> would fit within the user's currently available rlimit for
> RLIMIT_MSGQUEUE.  This was not an accurate check though.  It accounted
> for the msg number, and the payload size, and the array of pointers we
> used to point to the msg_msg structs that held each message, but ignored
> the msg_msg structs themselves.  Given that we accept the creation of
> message queues with a msg_size of 1, this could be used to create a
> minor DoS because of the fact that there was such a large size
> difference between the sizeof struct msg_msg and the size of our
> messages.  In this scenario, a msg_size of 1 would result in us
> accounting 9/5 bytes per message on 64bit/32bit OSes respecitively, but
> actually using 49bytes/19bytes respectively.  That's a 4:1 ratio at the
> worst case for the different between actual memory used and memory usage
> accounted against the RLIMIT_MSGQUEUE limit. So before I ever got around
> to doing the rbtree update, I fixed this to at least be more accurate
> and it became
> 
>  msg_num * (msg_size + sizeof struct msg_msg * + sizeof struct msg_msg)
> 
> Even this wasn't totally accurate though, as large messages could result
> in the allocation of additional msg_msgseg segments.  However, I ignored
> that inaccuracy because once the message size is large enough to need
> additional SG segments, we are no longer in danger of any sort of minor
> DoS because our own overhead will become nothing more than noise to the
> calculation.

So, for what it's worth, I applied the following patch in getrlimit.2
to describe the post 3.5 behavior. Look okay?

Cheers,

Michael



diff --git a/man2/getrlimit.2 b/man2/getrlimit.2
index 91fed13..a3e4285 100644
--- a/man2/getrlimit.2
+++ b/man2/getrlimit.2
@@ -250,8 +250,19 @@ Each message queue that the user creates counts (until it i
s removed)
 against this limit according to the formula:
 .nf
 
-    bytes = attr.mq_maxmsg * sizeof(struct msg_msg *) +
-            attr.mq_maxmsg * attr.mq_msgsize
+    Since Linux 3.5:
+        bytes = attr.mq_maxmsg * sizeof(struct msg_msg) +
+                min(attr.mq_maxmsg, MQ_PRIO_MAX) *
+                      sizeof(struct posix_msg_tree_node)+ 
+                                /* For overhead */
+                attr.mq_maxmsg * attr.mq_msgsize;
+                                /* For message data */
+
+    Linux 3.4 and earlier:
+        bytes = attr.mq_maxmsg * sizeof(struct msg_msg *) +
+                                /* For overhead */
+                attr.mq_maxmsg * attr.mq_msgsize;
+                                /* For message data */
 
 .fi
 where
@@ -259,11 +270,16 @@ where
 is the
 .I mq_attr
 structure specified as the fourth argument to
-.BR mq_open (3).
+.BR mq_open (3),
+and the
+.I msg_msg
+and
+.I posix_msg_tree_node
+structures are kernel-internal structures.
 
-The first addend in the formula, which includes
-.I "sizeof(struct msg_msg\ *)"
-(4 bytes on Linux/i386), ensures that the user cannot
+The "overhead" addend in the formula accounts for overhead
+bytes required by the implementation
+and ensures that the user cannot
 create an unlimited number of zero-length messages (such messages
 nevertheless each consume some system memory for bookkeeping overhead).
 .TP

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-10-01 10:02 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-29  9:10 Document POSIX MQ /proc/sys/fs/mqueue files Michael Kerrisk (man-pages)
2014-09-29 17:28 ` Doug Ledford
2014-09-30 10:12   ` Michael Kerrisk (man-pages)
2014-09-30 17:30     ` Davidlohr Bueso
2014-09-30 17:42       ` Davidlohr Bueso
2014-09-30 19:57     ` Doug Ledford
2014-10-01  8:19       ` Michael Kerrisk (man-pages)
2014-10-01 10:02   ` Michael Kerrisk (man-pages)
2014-09-29 20:23 ` Davidlohr Bueso
2014-09-30 10:49   ` Michael Kerrisk (man-pages)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).