Commit 66e61a9e9504f61b9a928c9055368c81da613a50 intorduced
ext4: Once a day, printk file system error information to dmesg
via kernel timer.
Error report may look like
[ 313.485876] EXT4-fs (sda6): error count: 13
[ 313.485887] EXT4-fs (sda6): initial error at 1283093815: ext4_lookup:1052: inode 4980737
[ 313.485895] EXT4-fs (sda6): last error at 1283094174: ext4_lookup:1052: inode 4980737
and I find it quite useful.
Sad but true - calling print_daily_error_info (by timer event)
on umounted fs will cause NULL pointer derefernce on superblock inode
(EXT4_SB(sb) returns NULL) resulting in OOPS (fatal error during
soft IRQ).
Stack trace will look similar to this one:
(lots of helpfull info cut)
IRQ
run_timer_softirq
?run_timer_softirq
?print_daily_error_info
?__do_softirq
__do_softirq
call_softirq
do_softirq
irq_exit
smp_apic_timer_interrupt
apic_timer_interrupt
EOI
intel_idle
intel_idel
cpuidle_idle_call
cpu_idle
start_secondary
And the solution is:
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 2614774..751997d 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -719,6 +719,7 @@ static void ext4_put_super(struct super_block *sb)
ext4_abort(sb, "Couldn't clean up the journal");
}
+ del_timer(&sbi->s_err_report);
ext4_release_system_zone(sb);
ext4_mb_release(sb);
ext4_ext_release(sb);
Ted Ts'o wrote:
Good catch! Thanks for the patch. I will include this into ext4
tree, and I will probably push it separately to Linus so that it gets
into 2.6.36, since this is a regresssion.
I had some concerns about print_daily_error_info.
> By the way, isn't print_daily_error_info racy? Is it safe to call
> print_daily_error_info
> (by timer event (softirq)) when we're remounting fs, etc.?
Ted Ts'o answered:
It should be fine. Remounting doesn't actually change out the struct
superblock. There is a chance that the information might not be fully
complete if an error is printed exactly as the same time as
print_daily_error_info() is run, but I'm not sure it's worth trying to
protect against that race, since the worst that this will mean is a
confusing report in the /var/log/messages file, and the ext4 error
message will be printed right next to it, which will have all of the
information the system administrator will need.
YAAYYY!
No comments:
Post a Comment