Bug #4278

Wrong OpenNebula DB consistency regarding to volatile disk quotas

Added by Esteban Freire Garcia over 5 years ago. Updated about 5 years ago.

Status:ClosedStart date:01/07/2016
Priority:NormalDue date:
Assignee:Ruben S. Montero% Done:

0%

Category:Core & System
Target version:Release 5.0
Resolution:fixed Pull request:
Affected Versions:OpenNebula 4.12

Description

  • We are observing wrong volatile disks quotas from time to time and since some months ago.
  • At the beginning, we thought if could be do to some crash on the OpenNebula head-node or some kind of connectivity lost with the OpenNebula database. But after being observing this behaviour for a while now, we can confirm that it is not due to any crash at OpenNebula head-node or any connectivity lost with the OpenNebula database.
  • Some examples:
  • We executed the onedb fsck command yesterday, this was the output:

```
onedb fsck [ ... ]
MySQL dump stored in /var/lib/one/mysql_localhost_opennebula.sql_2016-1-6_9:4:43.bck
Use 'onedb restore' or restore the DB using the mysql command:
mysql -u user -h server -P port db_name < backup_file

User 946 quotas: VOLATILE_SIZE_USED has 1024 is 0
User 946 quotas: Image 1781 RVMS has 0 is 1
User 951 quotas: VOLATILE_SIZE_USED has 20480 is 0
User 1207 quotas: VOLATILE_SIZE_USED has 10240 is 0
User 1216 quotas: VOLATILE_SIZE_USED has 10240 is 0
Group 411 quotas: VOLATILE_SIZE_USED has 1024 is 0
Group 411 quotas: Image 1781 RVMS has 0 is 1
Group 420 quotas: VOLATILE_SIZE_USED has 20480 is 0
Group 463 quotas: Datastore 104 IMAGES_USED has -2 is 0
Group 463 quotas: Datastore 104 SIZE_USED has -950 is 0
Group 465 quotas: VOLATILE_SIZE_USED has 10240 is 0
Group 468 quotas: VOLATILE_SIZE_USED has 10240 is 0

Total errors found: 12
A copy of this output was stored in /var/log/one/onedb-fsck.log

onedb fsck [ ... ]
MySQL dump stored in /var/lib/one/mysql_localhost_opennebula.sql_2016-1-6_9:6:0.bck
Use 'onedb restore' or restore the DB using the mysql command:
mysql -u user -h server -P port db_name < backup_file

Total errors found: 0
A copy of this output was stored in /var/log/one/onedb-fsck.log
```

  • According to previous output, it fixed quotas information for group 465 and user 1207 but it was not like that and we needed to execute the ondedb fsck command today at first hour again:

```
onedb fsck [ ... ]
MySQL dump stored in /var/lib/one/mysql_localhost_opennebula.sql_2016-1-7_9:0:12.bck
Use 'onedb restore' or restore the DB using the mysql command:
mysql -u user -h server -P port db_name < backup_file

User 794 quotas: VOLATILE_SIZE_USED has 10240 is 0
User 1207 quotas: VOLATILE_SIZE_USED has 71680 is 0
User 1207 quotas: Image 1769 RVMS has 0 is 1
User 1207 quotas: Image 1908 RVMS has 0 is 1
User 1207 quotas: Datastore 106 IMAGES_USED has 1 is 2
User 1207 quotas: Datastore 106 SIZE_USED has 2048 is 62248
Group 331 quotas: VOLATILE_SIZE_USED has 10240 is 0
Group 465 quotas: VOLATILE_SIZE_USED has 71680 is 0
Group 465 quotas: Image 1769 RVMS has 0 is 1
Group 465 quotas: Image 1908 RVMS has 0 is 1
Group 465 quotas: Datastore 106 IMAGES_USED has 2 is 3
Group 465 quotas: Datastore 106 SIZE_USED has 12048 is 72248

Total errors found: 12
A copy of this output was stored in /var/log/one/onedb-fsck.log
```

  • As it can be seen, group 465 still had same yesterday issue which it was supposed to be solved. After this, these groups are showing the right volatile disk quota. Some examples:

```
[root@opennebula4 ~]# onegroup show 331
GROUP 331 INFORMATION
ID : 331
NAME :

GROUP TEMPLATE
GROUP_ADMIN_DEFAULT_VIEW="groupadmin"
GROUP_ADMIN_VIEWS="groupadmin"
SUNSTONE_VIEWS="cloud,user"

USER ID ADMIN
794 *

RESOURCE USAGE & QUOTAS

NUMBER OF VMS               MEMORY                  CPU        VOLATILE_SIZE
2 / - 32.3G / - 100.00 / - 0M / 10G

DATASTORE ID IMAGES SIZE
104 2 / - 1020M / 15G
106 0 / - 0M / 5G

NETWORK ID               LEASES
0 2 / -
IMAGE ID          RUNNING VMS
557 1 / -
1073 1 / -
```

```
[root@opennebula4 ~]# onegroup show 465
GROUP 465 INFORMATION
ID : 465
NAME :

GROUP TEMPLATE
GROUP_ADMIN_DEFAULT_VIEW="user"
GROUP_ADMIN_VIEWS="groupadmin"
SUNSTONE_VIEWS="cloud,user"

USER ID ADMIN
1207 *
1208 *

RESOURCE USAGE & QUOTAS

NUMBER OF VMS               MEMORY                  CPU        VOLATILE_SIZE
1 / - 950G / - 40.00 / - 0M / 100G

DATASTORE ID IMAGES SIZE
119 1 / - 510M / 40G
106 3 / - 70.6G / 2T

NETWORK ID               LEASES
0 1 / -
IMAGE ID          RUNNING VMS
1785 1 / -
1769 1 / -
1908 1 / -
```
  • But we did not have any crash or connectivity lost since we executed onedb fsck command yesterday and we executed it again today so it is because we are considering it as a bug.

Thanks in advance,
Esteban

Associated revisions

Revision 87709d93
Added by Ruben S. Montero about 5 years ago

bug #4278: Sync volatile disk computations in core & fsck

History

#1 Updated by Anonymous over 5 years ago

We are still seeing this with release 4.14.
The result is that we need to fsck the database 2 times a week due to inconsistent states.

Esteban told me this issue is scheduled to be fixed in version 5, can i get any confirmation on that (to prevent it being forgotten)

#2 Updated by Ruben S. Montero over 5 years ago

  • Target version set to Release 5.0

Target version is 5.0 in redmine, now. We'll look at it thanks!

#3 Updated by Esteban Freire Garcia over 5 years ago

Thank you very much Ruben :)

#4 Updated by Ruben S. Montero about 5 years ago

  • Assignee set to Carlos Martín

#5 Updated by Ignacio M. Llorente about 5 years ago

  • Assignee changed from Carlos Martín to Ruben S. Montero

#6 Updated by Ruben S. Montero about 5 years ago

  • Status changed from Pending to Closed
  • Resolution set to fixed

The volatile disk size was slighlty different in core and fsck. This could impact on DS clonning on it self, like Ceph... I think this can be the problem. I am closing this and if this patch does not solve it we'll reopen

Also available in: Atom PDF