Bug #3726
Ceph monitor check broken during normal cluster degredation
Status: | Closed | Start date: | 03/30/2015 | |
---|---|---|---|---|
Priority: | High | Due date: | ||
Assignee: | Javi Fontan | % Done: | 0% | |
Category: | Documentation | |||
Target version: | Release 4.12.1 | |||
Resolution: | Pull request: | |||
Affected Versions: | OpenNebula 4.12 |
Description
There is a bug in the new Ceph monitor. It's assuming the columns are always static in the output of the ceph/rados output are static - however when there's any degredation (perfectly normal if you've lost a disk and rebalancing happening) then the awk picks up the wrong column (0 in my case). See https://gist.github.com/joelio/64ae2b9fe9116fcca4c6
This is a serious fault as it stops all provisioning on an otherwise healthy cluster.
I think it'd be more prudent to use the JSON output and parse that properly?
History
#1 Updated by Joel Merrick over 6 years ago
Been notified that XML output supported
ceph df -f xml
Might be better than JSON for ONE usage
#2 Updated by Ruben S. Montero over 6 years ago
Are you running 0.87.1?
Anyway, using xml/json should be better
#3 Updated by Ruben S. Montero over 6 years ago
- Target version set to Release 4.14
#4 Updated by Joel Merrick over 6 years ago
Yea, current stable (0.87 Giant)
ceph:
Installed: 0.87.1-1trusty
Candidate: 0.87.1-1trusty
Version table:
*** 0.87.1-1trusty 0
999 http://ceph.com/debian-giant/ trusty/main amd64 Packages
#5 Updated by Ruben S. Montero over 6 years ago
- Status changed from Pending to New
- Target version changed from Release 4.14 to Release 4.12.1
#6 Updated by Ruben S. Montero over 6 years ago
- Assignee set to Javi Fontan
#7 Updated by Ruben S. Montero over 6 years ago
Hi,
Checking it; I have
# rados df pool name category KB objects clones degraded unfound rd rd KB wr wr KB ...
but with ceph:
# ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED ... POOLS: NAME ID USED %USED MAX AVAIL OBJECTS ...
so you say that ceph df becomes rados df under degradation?
Cheers
#8 Updated by Joel Merrick over 6 years ago
I'm noticing no MAX_AVAIL in my output too
root@vm-head-01:/var/log/one# ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
59304G 53306G 5960G 10.05
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
rbd 0 0 0 0 0
cephfs_data 1 1930G 3.26 0 2062873
cephfs_metadata 2 38567k 0 0 75344
one 3 44924M 0.07 0 11350
#9 Updated by Joel Merrick over 6 years ago
root@vm-head-01:/var/log/one# ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
59304G 53306G 5960G 10.05
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
rbd 0 0 0 0 0
cephfs_data 1 1930G 3.26 0 2062873
cephfs_metadata 2 38567k 0 0 75344
one 3 44924M 0.07 0 11350
#10 Updated by Joel Merrick over 6 years ago
This is a fresh install too, nothing further done with Ceph for opennebula apart from setting auth up.
If it's just a case of getting MAX_AVAIL working then need to understand what we need to do and add to the docs :)
#11 Updated by Joel Merrick over 6 years ago
Ok, so I'm thinking this is a bug in Ceph now, sorry for the confusion!
#12 Updated by Joel Merrick over 6 years ago
Yes, I can confirm this bug with Ceph, not ONE.
You can close (or perhaps mark documentation?) this bug as there's not a lot you guys can do to mitigate.
Basically ensure ALL OSDs are in and up or do not use 0.87.1. specifically - looks fixed in later releases.
Thanks again guys! :)
#13 Updated by Ruben S. Montero over 6 years ago
- Category changed from Drivers - Storage to Documentation
Great! I'm no 100% sure but I think this bug also appears when the OSD's are weight'ed to 0... A note to the docs would be nice, at least in the release note pages.
Thanks for letting us know
Cheers
Joel Merrick wrote:
Yes, I can confirm this bug with Ceph, not ONE.
You can close (or perhaps mark documentation?) this bug as there's not a lot you guys can do to mitigate.
Basically ensure ALL OSDs are in and up or do not use 0.87.1. specifically - looks fixed in later releases.
Thanks again guys! :)
#14 Updated by Ruben S. Montero about 6 years ago
- Status changed from New to Closed
- Resolution set to worksforme
#15 Updated by Ruben S. Montero about 6 years ago
- Status changed from Closed to Pending
- Resolution deleted (
worksforme)