Thursday, September 16, 2010

ext4 regression

Hello,

Commit 66e61a9e9504f61b9a928c9055368c81da613a50 intorduced
ext4: Once a day, printk file system error information to dmesg

via kernel timer.

Error report may look like
[  313.485876] EXT4-fs (sda6): error count: 13
[  313.485887] EXT4-fs (sda6): initial error at 1283093815: ext4_lookup:1052: inode 4980737
[  313.485895] EXT4-fs (sda6): last error at 1283094174: ext4_lookup:1052: inode 4980737


and I find it quite useful.

Sad but true - calling print_daily_error_info (by timer event)
on umounted fs will cause NULL pointer derefernce on superblock inode
(EXT4_SB(sb) returns NULL) resulting in OOPS (fatal error during
soft IRQ).

Stack trace will look similar to this one:
(lots of helpfull info cut)

IRQ
        run_timer_softirq
        ?run_timer_softirq
        ?print_daily_error_info
        ?__do_softirq
        __do_softirq
        call_softirq
        do_softirq
        irq_exit
        smp_apic_timer_interrupt
        apic_timer_interrupt
EOI
        intel_idle
        intel_idel
        cpuidle_idle_call
        cpu_idle
        start_secondary



And the solution is:

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 2614774..751997d 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -719,6 +719,7 @@ static void ext4_put_super(struct super_block *sb)
             ext4_abort(sb, "Couldn't clean up the journal");
     }

+    del_timer(&sbi->s_err_report);
     ext4_release_system_zone(sb);
     ext4_mb_release(sb);
     ext4_ext_release(sb);



Ted Ts'o wrote:
Good catch!  Thanks for the patch.  I will include this into ext4
tree, and I will probably push it separately to Linus so that it gets
into 2.6.36, since this is a regresssion.

I had some concerns about print_daily_error_info
> By the way, isn't print_daily_error_info racy? Is it safe to call
> print_daily_error_info
> (by timer event (softirq)) when we're remounting fs, etc.?


Ted Ts'o answered:
It should be fine.  Remounting doesn't actually change out the struct
superblock.  There is a chance that the information might not be fully
complete if an error is printed exactly as the same time as
print_daily_error_info() is run, but I'm not sure it's worth trying to
protect against that race, since the worst that this will mean is a
confusing report in the /var/log/messages file, and the ext4 error
message will be printed right next to it, which will have all of the
information the system administrator will need.

YAAYYY!

No comments:

Post a Comment