From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759839Ab3ICJXb (ORCPT ); Tue, 3 Sep 2013 05:23:31 -0400 Received: from mail-bk0-f47.google.com ([209.85.214.47]:38530 "EHLO mail-bk0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759759Ab3ICJX3 (ORCPT ); Tue, 3 Sep 2013 05:23:29 -0400 Message-ID: <5225AA8D.6080403@colorfullife.com> Date: Tue, 03 Sep 2013 11:23:25 +0200 From: Manfred Spraul User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130805 Thunderbird/17.0.8 MIME-Version: 1.0 To: Vineet Gupta CC: Linus Torvalds , Davidlohr Bueso , Sedat Dilek , Davidlohr Bueso , linux-next , LKML , Stephen Rothwell , Andrew Morton , linux-mm , Andi Kleen , Rik van Riel , Jonathan Gonzalez Subject: Re: ipc-msg broken again on 3.11-rc7? References: <52205597.3090609@synopsys.com> <5224BCF6.2080401@colorfullife.com> <5225A466.2080303@colorfullife.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/03/2013 11:16 AM, Vineet Gupta wrote: > On 09/03/2013 02:27 PM, Manfred Spraul wrote: >> On 09/03/2013 10:44 AM, Vineet Gupta wrote: >>>> b) Could you check that it is not just a performance regression? >>>> Does ./msgctl08 1000 16 hang, too? >>> Nope that doesn't hang. The minimal configuration that hangs reliably is msgctl >>> 50000 2 >>> >>> With this config there are 3 processes. >>> ... >>> 555 554 root S 1208 0.4 0 0.0 ./msgctl08 50000 2 >>> 554 551 root S 1208 0.4 0 0.0 ./msgctl08 50000 2 >>> 551 496 root S 1208 0.4 0 0.0 ./msgctl08 50000 2 >>> ... >>> >>> [ARCLinux]$ cat /proc/551/stack >>> [<80aec3c6>] do_wait+0xa02/0xc94 >>> [<80aecad2>] SyS_wait4+0x52/0xa4 >>> [<80ae24fc>] ret_from_system_call+0x0/0x4 >>> >>> [ARCLinux]$ cat /proc/555/stack >>> [<80c2950e>] SyS_msgrcv+0x252/0x420 >>> [<80ae24fc>] ret_from_system_call+0x0/0x4 >>> >>> [ARCLinux]$ cat /proc/554/stack >>> [<80c28c82>] do_msgsnd+0x116/0x35c >>> [<80ae24fc>] ret_from_system_call+0x0/0x4 >>> >>> Is this a case of lost wakeup or some such. I'm running with some more diagnostics >>> and will report soon ... >> What is the output of ipcs -q? Is the queue full or empty when it hangs? >> I.e. do we forget to wake up a receiver or forget to wake up a sender? > / # ipcs -q > > ------ Message Queues -------- > key msqid owner perms used-bytes messages > 0x72d83160 163841 root 600 0 0 > > Ok, a sender is sleeping - even though there are no messages in the queue. Perhaps it is the race that I mentioned in a previous mail: > for (;;) { > struct msg_sender s; > > err = -EACCES; > if (ipcperms(ns, &msq->q_perm, S_IWUGO)) > goto out_unlock1; > > err = security_msg_queue_msgsnd(msq, msg, msgflg); > if (err) > goto out_unlock1; > > if (msgsz + msq->q_cbytes <= msq->q_qbytes && > 1 + msq->q_qnum <= msq->q_qbytes) { > break; > } > [snip] > if (!pipelined_send(msq, msg)) { > /* no one is waiting for this message, enqueue it */ > list_add_tail(&msg->m_list, &msq->q_messages); > msq->q_cbytes += msgsz; > msq->q_qnum++; > atomic_add(msgsz, &ns->msg_bytes); The access to msq->q_cbytes is not protected. Vineet, could you try to move the test for free space after ipc_lock? I.e. the lock must not get dropped between testing for free space and enqueueing the messages. -- Manfred