From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S932269AbbLNKK1 (ORCPT <rfc822;w@1wt.eu>);
	Mon, 14 Dec 2015 05:10:27 -0500
Received: from aserp1040.oracle.com ([141.146.126.69]:27519 "EHLO
	aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S932099AbbLNKK0 (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Mon, 14 Dec 2015 05:10:26 -0500
Subject: Re: [PATCH] udf: limit the maximum number of TD redirections
To: Jan Kara <jack@suse.cz>
References: <1449764021-20609-1-git-send-email-vegard.nossum@oracle.com>
 <20151214095229.GB8474@quack.suse.cz>
Cc: Jan Kara <jack@suse.com>, linux-kernel@vger.kernel.org,
        quentin.casasnovas@oracle.com
From: Vegard Nossum <vegard.nossum@oracle.com>
Message-ID: <566E9588.4010208@oracle.com>
Date: Mon, 14 Dec 2015 11:10:16 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
 Thunderbird/38.2.0
MIME-Version: 1.0
In-Reply-To: <20151214095229.GB8474@quack.suse.cz>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-Source-IP: aserv0022.oracle.com [141.146.126.234]
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 12/14/2015 10:52 AM, Jan Kara wrote:
> On Thu 10-12-15 17:13:41, Vegard Nossum wrote:
>> Filesystem fuzzing revealed that we could get stuck in the
>> udf_process_sequence() loop.
>>
>> The maximum limit was chosen arbitrarily but fixes the problem I saw.
>
> Process nit: The patch is missing your Signed-off-by.
>

Oops, sorry! If the patch is still being considered, here it is:

Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>

>> diff --git fs/udf/super.c fs/udf/super.c
>> index 81155b9..fd45537 100644
>> --- fs/udf/super.c
>> +++ fs/udf/super.c
>> @@ -1586,6 +1586,13 @@ static void udf_load_logicalvolint(struct super_block *sb, struct kernel_extent_
>>   }
>>
>>   /*
>> + * Maximum number of Terminating Descriptor redirections. The chosen number is
>> + * arbitrary - just that we hopefully don't limit any real use of rewritten
>> + * inode on write-once media but avoid looping for too long on corrupted media.
>> + */
>> +#define UDF_MAX_TD_NESTING 64
>> +
>> +/*
>>    * Process a main/reserve volume descriptor sequence.
>>    *   @block		First block of first extent of the sequence.
>>    *   @lastblock		Lastblock of first extent of the sequence.
>> @@ -1609,6 +1616,7 @@ static noinline int udf_process_sequence(
>>   	uint16_t ident;
>>   	long next_s = 0, next_e = 0;
>>   	int ret;
>> +	unsigned int indirections = 0;
>>
>>   	memset(vds, 0, sizeof(struct udf_vds_record) * VDS_POS_LENGTH);
>>
>> @@ -1679,6 +1687,12 @@ static noinline int udf_process_sequence(
>>   			}
>>   			break;
>>   		case TAG_IDENT_TD: /* ISO 13346 3/10.9 */
>> +			if (++indirections > UDF_MAX_TD_NESTING) {
>> +				udf_err(sb, "too many TDs (max %u supported)\n", UDF_MAX_TD_NESTING);
>> +				brelse(bh);
>> +				return -EIO;
>> +			}
>> +
>>   			vds[VDS_POS_TERMINATING_DESC].block = block;
>>   			if (next_e) {
>
> But this doesn't really help much. When we speak about corrupted media,
> then most likely we hit a case where descriptor CRCs won't match and so we
> return EIO. How exactly did your fuzzing corrupt the filesystem? I suppose
> it hardly created long sequence of correct VDP descriptors just by pure
> "luck".

I think the problem is circular TD descriptors, i.e. in my case I see
this loop:

         for (; (!done && block <= lastblock); block++) {

going endlessly over blocks {257, 258, 259, 260, 261, 262} without ever
returning.

I put my check in the TAG_IDENT_TD case because that's where "block" is
assigned apart from just being incremented, but I see there's some more
stuff going on with next_s/next_e/lastblock as well (maybe involving
TAG_IDENT_VDP). However, I don't really know UDF so maybe there is a
better way to stop the infinite loop from happening.

> When we speak about malicious media, then we are free to generate volume
> descriptor sequence of valid descriptors that doesn't have a single
> terminating descriptor and so we will read the whole area for volume
> descriptor sequence described in the anchor anyway. So I don't see the case
> where your patch would significantly help.
>
> Frankly a malicious volume that makes the kernel read a lot of disk when
> mounting fails to excite me much. But what we could do is to make mount
> process killable when reading the descriptor sequence. That should at least
> allow sysadmin to terminate mount of malicious media. Hmm?

As I said, I think the problem is a recursive structure rather than just
a very long one. BTW I lifted the "format" of the patch from your
commit c03aa9f6e1f9. There seems to be at least 1 or 2 other similar
problems, see the second patch I sent. I would be happy to point to the
problematic places if you have a better way of fixing these.

Thanks for the response,


Vegard