Public
Snippet $9 authored by Matthew Monaco

Finding Ceph "slow request" culprit

ceph-slow-req.md

Tracking down the source of Ceph slow requests

This happens often enough that it's worth writing down. This isn't about finding the true cause of the "slow request" / "request blocked" issue, but it will let you know which data is hung.

So you notice that ceph does not have HEALTH_OK:

    cluster f09294bb-202c-4535-a919-a4acc656ff64
     health HEALTH_WARN 1 pgs degraded; 1 pgs stuck degraded; 1 pgs stuck unclean; 6 requests are blocked > 32 sec; recovery 1/3619221 objects degraded (0.000%)
     monmap e10: 3 mons at {sm1-2=172.20.66.71:6789/0,sm1-4=172.20.66.73:6789/0,sm1-8=172.20.66.77:6789/0}, election epoch 2624, quorum 0,1,2 sm1-2,sm1-4,sm1-8
     mdsmap e470: 1/1/1 up {0=sm1-5=up:active}
     osdmap e61562: 40 osds: 40 up, 40 in
      pgmap v29258664: 12288 pgs, 6 pools, 4715 GB data, 1178 kobjects
            12105 GB used, 25109 GB / 37260 GB avail
            1/3619221 objects degraded (0.000%)
               12287 active+clean
                   1 active+degraded
  client io 1822 kB/s rd, 154 kB/s wr, 135 op/s

Find out which OSD has a request blocked:

# ceph health detail
HEALTH_WARN 1 pgs degraded; 1 pgs stuck degraded; 1 pgs stuck unclean; 6 requests are blocked > 32 sec; 1 osds have slow requests; recovery 1/3619221 objects degraded (0.000%)
pg 3.3a4 is stuck unclean for 17565.234526, current state active+degraded, last acting [13,23,38]
pg 3.3a4 is stuck degraded for 1667.058610, current state active+degraded, last acting [13,23,38]
pg 3.3a4 is active+degraded, acting [13,23,38]
6 ops are blocked > 1048.58 sec
6 ops are blocked > 1048.58 sec on osd.38
1 osds have slow requests
recovery 1/3619221 objects degraded (0.000%)

Find which object is affected:

# grep "slow request" /var/log/ceph/ceph-osd.38.log
2015-02-08 21:20:40.260508 7fdf7e2b2700  0 log_channel(default) log [WRN] : slow request 960.999831 seconds old, received at 2015-02-08 21:04:39.260556: osd_op(client.29247372.0:19771002 rbd_data.1be46ea2eb141f2.000000000000031f [set-alloc-hint object_size 4194304 write_size 4194304,write 1556480~16384] 3.e4feb3e7 snapc 6a=[6a] ack+ondisk+write+known_if_redirected e61562) currently waiting for subops from 13,25

We see that in this case it's object rbd_data.1be46ea2eb141f2.000000000000031f. So, assuming this is the cinder volumes pool, we can find the name of the rbd:

# for rbd in $(rbd ls -p volumes); do rbd info volumes/$rbd; done | grep -B3 -A2 rbd_data.1be46ea2eb141f2
rbd image 'volume-9ff8463f-0f14-49b1-a478-ffe27d0623af':
    size 8192 MB in 2048 objects
    order 22 (4096 kB objects)
    block_name_prefix: rbd_data.1be46ea2eb141f2
    format: 2
    features: layering

Now we can use cinder to get the name of the volume:

# cinder show 9ff8463f-0f14-49b1-a478-ffe27d0623af

If instead the name was 9ff8463f-0f14-49b1-a478-ffe27d0623af_disk, then this is an ephemeral instance disk and we can use nova

# nova show 9ff8463f-0f14-49b1-a478-ffe27d0623af