Friday, March 25, 2011

[OOPS] elevator private data for REQ_FLUSH requests

Commit
    9d5a4e946ce5352f19400b6370f4cd8e72806278
    block: skip elevator data initialization for flush requests

    Skip elevator initialization for flush requests by passing priv=0 to
    blk_alloc_request() in get_request().  As such elv_set_request() is
    never called for flush requests.

introduced priv flag, to skip elevator_private data init for block FLUSH 
requests. This lead to a NULL pointer deref on my machine in cfq_insert_request,
which requires elevator_private to be set:

  1 [   78.982169] Call Trace:                                                                                                                                                                                                     
  2 [   78.982178]   cfq_insert_request+0x4e/0x47d
  3 [   78.982184]   ? do_raw_spin_lock+0x6b/0x122
  4 [   78.982189]   elv_insert+0x212/0x265
  5 [   78.982192]   __elv_add_request+0x50/0x52
  6 [   78.982195]   flush_plug_list+0xce/0x12f
  7 [   78.982199]   __blk_flush_plug+0x15/0x21
  8 [   78.982205]   schedule+0x43e/0xbea
  9 [   78.982211]   ? __lock_acquire+0x149d/0x1576
 10 [   78.982215]   ? drive_stat_acct+0x1b6/0x1c3
 11 [   78.982218]   ? drive_stat_acct+0x44/0x1c3
 12 [   78.982223]   ? __make_request+0x268/0x2bf
 13 [   78.982226]   schedule_timeout+0x35/0x3b8
 14 [   78.982230]   ? mark_held_locks+0x4b/0x6d
 15 [   78.982234]   ? _raw_spin_unlock_irq+0x28/0x56
 16 [   78.982239]   ? get_parent_ip+0xe/0x3e
 17 [   78.982244]   ? sub_preempt_count+0x90/0xa3
 18 [   78.982247]   wait_for_common+0xc3/0x141
 19 [   78.982251]   ? default_wake_function+0x0/0xf
 20 [   78.982254]   wait_for_completion+0x18/0x1a
 21 [   78.982258]   blkdev_issue_flush+0xcb/0x11a
 22 [   78.982264]   ext4_sync_file+0x2b3/0x302
 23 [   78.982268]   vfs_fsync_range+0x55/0x75
 24 [   78.982271]   generic_write_sync+0x3f/0x41
 25 [   78.982278]   generic_file_aio_write+0x8c/0xb9
 26 [   78.982281]   ext4_file_write+0x1dc/0x237
 27 [   78.982285]   ? do_raw_spin_lock+0x6b/0x122
 28 [   78.982288]   ? ext4_file_write+0x0/0x237
 29 [   78.982292]   do_sync_readv_writev+0xb4/0xf9
 30 [   78.982298]   ? security_file_permission+0x1e/0x84
 31 [   78.982302]   ? rw_verify_area+0xab/0xc8
 32 [   78.982305]   do_readv_writev+0xb8/0x17d
 33 [   78.982309]   ? fget_light+0x166/0x30b
 34 [   78.982312]   vfs_writev+0x40/0x42
 35 [   78.982315]   sys_pwritev+0x55/0x99
 36 [   78.982320]   system_call_fastpath+0x16/0x1b
 37 
 
My solution was to use ELEVATOR_INSERT_FLUSH flag as an elv_insert param 
for REQ_FLUSH | REQ_FUA requests (lkml)

---
 block/elevator.c |    2 ++
@@ -734,6 +734,8 @@ void __elv_add_request(struct request_queue *q, struct request *rq, int where)
    q->end_sector = rq_end_sector(rq);
    q->boundary_rq = rq;
   }
+ } else if (rq->cmd_flags & (REQ_FLUSH | REQ_FUA)) {
+  where = ELEVATOR_INSERT_FLUSH;
  } else if (!(rq->cmd_flags & REQ_ELVPRIV) &&
       where == ELEVATOR_INSERT_SORT)
   where = ELEVATOR_INSERT_BACK;



Jens Axboe has came up with more high-level solution:
 @@ -2702,7 +2702,10 @@ static void flush_plug_list(struct blk_plug *plug)
                /*
                 * rq is already accounted, so use raw insert
                 */
-               __elv_add_request(q, rq, ELEVATOR_INSERT_SORT_MERGE);
+               if (rq->cmd_flags & (REQ_FLUSH | REQ_FUA))
+                       __elv_add_request(q, rq, ELEVATOR_INSERT_FLUSH);
+               else
+                       __elv_add_request(q, rq, ELEVATOR_INSERT_SORT_MERGE);
        }

        if (q) {


(flush_plug_list is one level higher than __elv_add_request)
 
Git pull request message says (lkml)
"Thanks a lot to the people involved with fixing the first issue." 
 
hope he was talking about me...


.

No comments:

Post a Comment