From mboxrd@z Thu Jan 1 00:00:00 1970 From: Xue jiufei Date: Wed, 27 Aug 2014 09:57:38 +0800 Subject: [Ocfs2-devel] A deadlock when system do not has sufficient memory In-Reply-To: References: <53F41CAE.2040204@huawei.com> <53F6FFB7.1090305@huawei.com> <53FA99FD.1090008@huawei.com> <53FACC99.8070701@huawei.com> <53FAD239.7020709@huawei.com> Message-ID: <53FD3B12.5070802@huawei.com> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: ocfs2-devel@oss.oracle.com Hi, Sunil On 2014/8/26 1:13, Sunil Mushran wrote: > On Sun, Aug 24, 2014 at 11:05 PM, Joseph Qi > wrote: > > On 2014/8/25 13:45, Sunil Mushran wrote: > > Please could you expand on that. > > > In our scenario, one node can mount multiple volumes across the > cluster. > For instance, N1 has mounted ocfs2 volumes say volume1, volume2, > volume3. And volume3 may do umount/mount during runtime of other > volumes. > > > I meant expand on the deadlock. Say we are mounting a new volume and that triggers a inode cleanup. That inode being cleaned up will have to be from one of the mounted volumes. How can this lead to a deadlock? > > Two variations: > a) Node death leading to recovery during the mount. > b) Mount atop a mount. > > But I cannot still see a deadlock in either scenario. The deadlock situation is just the same as the I described in my first mail. o2net_wq -> dlm_query_region_handler -> kmalloc(no sufficient memory) -> triggers ocfs2 inodes cleanup -> ocfs2_drop_lock -> call o2net_send_message to send unlock message -> wait_event(nsw.ns_wq, o2net_nsw_completed(nn, &nsw)) to wait for the reply from master -> tcp layer receive the reply, call o2net_data_ready -> queue sc_rx_work, but o2net_wq cannot handle this work so it triggers the deadlock, o2net_wq is waiting itself to handle unlock reply and complete the nsw. Thanks. Xuejiufei