From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dan Mick Subject: Re: CEPH RBD client kernel panic when OSD connection is lost on kernel 3.2, 3.5, 3.5.4 Date: Mon, 24 Sep 2012 11:50:16 -0700 Message-ID: <5060AB68.8090803@inktank.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from mail-pa0-f46.google.com ([209.85.220.46]:63693 "EHLO mail-pa0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755537Ab2IXSuT (ORCPT ); Mon, 24 Sep 2012 14:50:19 -0400 Received: by padhz1 with SMTP id hz1so1111981pad.19 for ; Mon, 24 Sep 2012 11:50:19 -0700 (PDT) In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Christian Huang Cc: ceph-devel@vger.kernel.org, sage@inktank.com We're looking into this, Christian. On 09/24/2012 03:23 AM, Christian Huang wrote: > Hi, > we met the following issue while testing ceph cluster HA. > Appreciate if anyone can shed some light. > could this be related to the configuration ? (ie, 2 OSD nodes only) > > Issue description: > ceph rbd client will kernel panic if an OSD server loses it's > network connectivity. > so far, we can reproduce it with certainty. > we have tried with the following kernels > a. Stock kernel from 12.04 (3.2 series) > 3.5 series, as suggested in a previous mail by Sage > b. 3.5.0-15 from quantal repo, > git://kernel.ubuntu.com/ubuntu/ubuntu-quantal.git, Ubuntu-3.5.0-15.22 > tag > c. v3.5.4-quantal, > http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.5.4-quantal/ > > Environment: > OS: Ubuntu 12.04 precise pangolin > Ceph configuration: > OSD nodes: 2 x 12 drives , 1 os drive, 11 are mapped to OSD > 0-10, 10GbE link > Monitor nodes: 3 x KVM virtual machines on ubuntu host. > test client: fresh install of Ubuntu 12.04.1 > Ceph version used: 0.48, 0.48.1, 0.48.2, 0.51 > all nodes have the same kernel version. > > steps to reproduce: > on the test client, > 1. load rbd modules > 2. create rbd device > 3. map rbd device > 4. use fio tool to create work load on the device, 8 threads is > used for workload > we have also tried with iometer, 8 workers, 32k 50/50, same results. > > on one of the OSD nodes, > 1. sudo ifconfig eth0 down #where eth0 is the primary interface > configured for ceph. > 2. within 30 seconds, the test client will panic. > > this happens when there is IO activity on the RBD device, and one > of the OSD nodes loses connectivity. > > The netconsole output is available available from the following > dropbox link, > zip: goo.gl/LHytr > > Best Regards > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >