sergey senozhatsky's blog: ext4 regression

Hello,

Commit 66e61a9e9504f61b9a928c9055368c81da613a50 intorduced
ext4: Once a day, printk file system error information to dmesg

via kernel timer.

Error report may look like
[ 313.485876] EXT4-fs (sda6): error count: 13
[ 313.485887] EXT4-fs (sda6): initial error at 1283093815: ext4_lookup:1052: inode 4980737
[ 313.485895] EXT4-fs (sda6): last error at 1283094174: ext4_lookup:1052: inode 4980737

and I find it quite useful.

Sad but true - calling print_daily_error_info (by timer event)
on umounted fs will cause NULL pointer derefernce on superblock inode
(EXT4_SB(sb) returns NULL) resulting in OOPS (fatal error during
soft IRQ).

Stack trace will look similar to this one:
(lots of helpfull info cut)

IRQ
        run_timer_softirq
        ?run_timer_softirq
        ?print_daily_error_info
        ?__do_softirq
        __do_softirq
        call_softirq
        do_softirq
        irq_exit
        smp_apic_timer_interrupt
        apic_timer_interrupt
EOI
        intel_idle
        intel_idel
        cpuidle_idle_call
        cpu_idle
        start_secondary

And the solution is:

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 2614774..751997d 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -719,6 +719,7 @@ static void ext4_put_super(struct super_block *sb)
             ext4_abort(sb, "Couldn't clean up the journal");
     }

+    del_timer(&sbi->s_err_report);
     ext4_release_system_zone(sb);
     ext4_mb_release(sb);
     ext4_ext_release(sb);

Ted Ts'o wrote:

Good catch! Thanks for the patch. I will include this into ext4
tree, and I will probably push it separately to Linus so that it gets
into 2.6.36, since this is a regresssion.

I had some concerns about print_daily_error_info.

> By the way, isn't print_daily_error_info racy? Is it safe to call
> print_daily_error_info
> (by timer event (softirq)) when we're remounting fs, etc.?

Ted Ts'o answered:

It should be fine. Remounting doesn't actually change out the struct
superblock. There is a chance that the information might not be fully
complete if an error is printed exactly as the same time as
print_daily_error_info() is run, but I'm not sure it's worth trying to
protect against that race, since the worst that this will mean is a
confusing report in the /var/log/messages file, and the ext4 error
message will be printed right next to it, which will have all of the
information the system administrator will need.

YAAYYY!

sergey senozhatsky's blog

Thursday, September 16, 2010

ext4 regression

No comments:

Post a Comment

About Me

Blog Archive

Total Pageviews