Monday, August 23, 2010

touch_(nmi|softlockup)_watchdog

Hello,

It is a mistake to think you can solve any major problems just with potatoes.
Douglas Adams

[   67.703556] BUG: using smp_processor_id() in preemptible [00000000] code
[   67.703563] caller is touch_nmi_watchdog+0x15/0x2c
[   67.703568] Call Trace:
[   67.703575]  [<ffffffff811f6bf1>] debug_smp_processor_id+0xc9/0xe4
[   67.703578]  [<ffffffff81092766>] touch_nmi_watchdog+0x15/0x2c
[   67.703584]  [<ffffffff81222950>] acpi_os_stall+0x34/0x40
[   67.703589]  [<ffffffff812398d2>] acpi_ex_system_do_stall+0x34/0x38
[   67.703591]  [<ffffffff81238396>] acpi_ex_opcode_1A_0T_0R+0x6d/0xa1
[   67.703595]  [<ffffffff8122e280>] acpi_ds_exec_end_op+0xf8/0x578
[   67.703598]  [<ffffffff812457f9>] acpi_ps_parse_loop+0x88a/0xa55
[   67.703604]  [<ffffffff81244a00>] acpi_ps_parse_aml+0x104/0x3c4
[   67.703607]  [<ffffffff81246198>] acpi_ps_execute_method+0x20f/0x2f3
[   67.703610]  [<ffffffff8124021f>] acpi_ns_evaluate+0x18b/0x2d2
[   67.703614]  [<ffffffff8123fad0>] acpi_evaluate_object+0x1b8/0x2fc
[   67.703617]  [<ffffffff8123e020>] ? acpi_get_sleep_type_data+0x21c/0x236
[   67.703620]  [<ffffffff8123d9fb>] acpi_enter_sleep_state_prep+0x61/0xd9
[   67.703623]  [<ffffffff81224205>] acpi_sleep_prepare+0x4f/0x56
[   67.703626]  [<ffffffff81224268>] __acpi_pm_prepare+0x13/0x2e
[   67.703629]  [<ffffffff81224448>] acpi_pm_prepare+0xe/0x1f
[   67.703632]  [<ffffffff81224466>] acpi_hibernation_pre_snapshot+0xd/0x1e
[   67.703637]  [<ffffffff81071b80>] hibernation_snapshot+0xaf/0x258
[   67.703641]  [<ffffffff81074dca>] snapshot_ioctl+0x25c/0x547
[   67.703645]  [<ffffffff81056efc>] ? __srcu_read_unlock+0x3b/0x57
[   67.703649]  [<ffffffff810e7f7d>] vfs_ioctl+0x31/0xa2
[   67.703652]  [<ffffffff810e88dc>] do_vfs_ioctl+0x47c/0x4af
[   67.703655]  [<ffffffff8125ee3c>] ? n_tty_write+0x0/0x35e
[   67.703659]  [<ffffffff8100203a>] ? sysret_check+0x2e/0x69
[   67.703662]  [<ffffffff810e8960>] sys_ioctl+0x51/0x75
[   67.703665]  [<ffffffff81002002>] system_call_fastpath+0x16/0x1b
[...]
[   67.703668] BUG: using smp_processor_id() in preemptible [00000000] code
[   67.703670] caller is touch_softlockup_watchdog+0x15/0x2b
[   67.703674] Call Trace:
[   67.703677]  [<ffffffff811f6bf1>] debug_smp_processor_id+0xc9/0xe4
[   67.703680]  [<ffffffff8109273b>] touch_softlockup_watchdog+0x15/0x2b
[   67.703682]  [<ffffffff81092779>] touch_nmi_watchdog+0x28/0x2c
[...]



Sometimes things are way much complicated than you may think at first.
The solution is pretty obvious... and pretty wrong at the same time.

Frederic Weisbe wrote:
[..]
It is buggy by nature.
[..]
The problem is on the caller. Considering such udelays loop:

* if it's in a irq disabled section, call touch_nmi_watchdog(), because this
  could prevent the nmi watchdog irq from firing
* if it's in a non-preemptable section, call touch_softlockup_watchdog(), because
  this could prevent the softlockup watchdog task from beeing scheduled
* if it's from a preemptable task context, this should call cond_resched() to
  avoid huge latencies on !CONFIG_PREEMPT

But acpi_os_stall() seem to be called from 4 different places, and these places
may run in different context like the above described.

It means that get_cpu()/put_cpu() are just masking the problem, despite the fact that what we actually need is to fix the problem. And it wasn't obvious to me (little, silly me).

So, we have git reset to previous code in touch_(nmi|softlockup)_watchdog.

No comments:

Post a Comment