From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jia-Ju Bai Subject: Re: [PATCH] atm: firestream: Replace GFP_ATOMIC with GFP_KERNEL in fs_send Date: Sat, 27 Jan 2018 12:09:38 +0800 Message-ID: References: <1516953627-30983-1-git-send-email-baijiaju1990@gmail.com> <20180126120522.GX13338@ZenIV.linux.org.uk> <20180126.110739.419621238821097728.davem@davemloft.net> <20180126170826.GY13338@ZenIV.linux.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Cc: 3chas3@gmail.com, linux-atm-general@lists.sourceforge.net, netdev@vger.kernel.org, linux-kernel@vger.kernel.org To: Al Viro , David Miller Return-path: Received: from mail-pf0-f196.google.com ([209.85.192.196]:44283 "EHLO mail-pf0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751475AbeA0EJ7 (ORCPT ); Fri, 26 Jan 2018 23:09:59 -0500 In-Reply-To: <20180126170826.GY13338@ZenIV.linux.org.uk> Content-Language: en-US Sender: netdev-owner@vger.kernel.org List-ID: On 2018/1/27 1:08, Al Viro wrote: > On Fri, Jan 26, 2018 at 11:07:39AM -0500, David Miller wrote: >>>> This is found by a static analysis tool named DCNS written by myself. >>> The trouble is, places like >>> net/atm/raw.c:65: vcc->send = atm_send_aal0; >>> net/atm/raw.c:74: vcc->send = vcc->dev->ops->send; >>> net/atm/raw.c:83: vcc->send = vcc->dev->ops->send; >>> mean extra call chains. It's *not* just vcc_sendmsg(), and e.g. >>> ret = ATM_SKB(skb)->vcc->send(ATM_SKB(skb)->vcc, skb) >>> ? DROP_PACKET : 1; >>> bh_unlock_sock(sk_atm(vcc)); >>> in pppoatm_send() definitely is called under a spinlock. >>> >>> Looking through the driver (in advanced bitrot, as usual for drivers/atm), >>> I'd say that submit_queue() is fucked in head in the "queue full" case. >>> And judging by the history, had been thus since the original merge... >> Jia-Ju, I'm probably not going to apply any of your GFP_KERNEL >> conversions. >> >> Al's analysis above is similar to how things looked for other patches >> you submited of this nature. >> >> So because of the lack of care and serious auditing you put into these >> conversions, I really have no choice but to drop them from my queue >> because on the whole they are adding bugs rather than improving the >> code. > FWIW, the tool *does* promise to be useful Thanks, I am happy to hear that :) > - as far as I understand it > looks for places where GFP_ATOMIC allocation goes with blocking operations > near every callchain leading there. And that indicates something fishy > going on - either pointless GFP_ATOMIC (in benign case), or something > much nastier: a callchain that would require GFP_ATOMIC. In that case > whatever blocking operation found along the way is a bug. In fact, my tool first collects all places of GFP_ATOMIC and mdelay in the whole kernel code. Then it starts analysis from the entry of each interrupt handler and spin-lock function call in the whole kernel code, to mark the places of GFP_ATOMIC and mdelay that are called in atomic context. The remaining unmarked places of GFP_ATOMIC and mdelay are reported and can be replaced with GFP_KERNEL and mdelay (or usleep_range). Though my tool has handled some common situations of function pointers, But it does not well handle function pointer passing in this code (vcc->send = vcc->dev->ops->send), so the tool needs to be improved. I am sorry for my incorrect report... > This time it has, AFAICS, caught a genuine bug in drivers/atm/firestream.c - > static void submit_qentry (struct fs_dev *dev, struct queue *q, struct FS_QENTRY *qe) > { > u32 wp; > struct FS_QENTRY *cqe; > > /* XXX Sanity check: the write pointer can be checked to be > still the same as the value passed as qe... -- REW */ > /* udelay (5); */ > while ((wp = read_fs (dev, Q_WP (q->offset))) & Q_FULL) { > fs_dprintk (FS_DEBUG_TXQ, "Found queue at %x full. Waiting.\n", > q->offset); > schedule (); > } > ... > static void submit_queue (struct fs_dev *dev, struct queue *q, > u32 cmd, u32 p1, u32 p2, u32 p3) > { > struct FS_QENTRY *qe; > > qe = get_qentry (dev, q); > qe->cmd = cmd; > qe->p0 = p1; > qe->p1 = p2; > qe->p2 = p3; > submit_qentry (dev, q, qe); > ... > static int fs_send (struct atm_vcc *atm_vcc, struct sk_buff *skb) > { > ... > td = kmalloc (sizeof (struct FS_BPENTRY), GFP_ATOMIC); > ... > submit_queue (dev, &dev->hp_txq, > QE_TRANSMIT_DE | vcc->channo, > virt_to_bus (td), 0, > virt_to_bus (td)); > ... > > Either all callchains leading to fs_send() are in non-atomic contexts > (in which case that GFP_ATOMIC would be pointless) or there's one > where we cannot block. Any such would be a potential deadlock. > > And the latter appears to be the case - fs_send() is atmdev_ops ->send(), > which can end up in atm_vcc ->send(), which can be called from under > ->sk_lock.slock. Yes, I think schedule() can cause a sleep-in-atomic-context bug. > What would be really useful: > * list of "here's a list of locations where we do something > blocking; each callchain to this GFP_ATOMIC allocation passes either > upstream of one of those without leaving atomic context in between > or downstream without entering one". > * after that - backtracking these callchains further, watching > for ones in atomic contexts. Can be done manually, but if that tool > can assist in doing so, all the better. If we do find one, we have > found a deadlock - just take the blocking operation reported for > that callchain and that's it. If it's not an obvious false positive > (e.g. > if (!foo) > spin_lock(&splat); > ..... > if (foo) > schedule(); > ), it's worth reporting to maintainers of the code in question. > * if all callchains reach obviously non-atomic contexts > (syscall entry, for example, or a kernel thread payload, or > a function documented to require a mutex held by caller, etc.) - > then a switch to GFP_KERNEL might be appropriate. With analysis > of callchains posted as you are posting that. > * either way, having the tool print the callchains out > would be a good idea - for examining them, for writing reports, > etc. Thanks for your very helpful advice :) I will follow it in my patches. Thanks, Jia-Ju Bai