Bug #4298

Deleting lots of images at one time leaves some in place

Added by EOLE Team almost 5 years ago. Updated about 4 years ago.

Status:PendingStart date:01/20/2016
Priority:NormalDue date:
Assignee:Carlos Martín% Done:

0%

Category:Drivers - Storage
Target version:-
Resolution: Pull request:
Affected Versions:OpenNebula 4.14

Description

Hello,

I'm deleting lots of images throught Sunstone and after some time some of the should be deleted images are present with status DELETE@.

Here are the logs from /var/lib/one/oned.log for one of them:

Wed Jan 20 13:57:50 2016 [Z0][ReM][D]: Req:8512 UID:2 ImageInfo invoked , 7854
Wed Jan 20 13:57:50 2016 [Z0][ReM][D]: Req:8512 UID:2 ImageInfo result SUCCESS, "<IMAGE><ID>7854</ID>..." 
Wed Jan 20 13:57:50 2016 [Z0][ReM][D]: Req:432 UID:2 ImageDelete invoked , 7854
Wed Jan 20 13:57:50 2016 [Z0][ReM][D]: Req:432 UID:2 ImageDelete result SUCCESS, 7854

But the image is still present:

oneadmin@one-frontend:~$ oneimage show 7854
IMAGE 7854 INFORMATION
ID             : 7854
NAME           : sphynx-2.5.2a1-daily-amd64.fi
USER           : jenkins
GROUP          : mensr
DATASTORE      : image
TYPE           : OS
REGISTER TIME  : 12/15 19:54:47
PERSISTENT     : No
SOURCE         : /var/lib/one//datastores/101/fbceb95d4d622e0952297059753a4468
FSTYPE         : save_as
SIZE           : 40G
STATE          : dele
RUNNING_VMS    : 0
oneadmin@one-frontend:~$ ls -l /var/lib/one//datastores/101/fbceb95d4d622e0952297059753a4468
-rw-r--r-- 1 oneadmin oneadmin 4843503616 janv. 14 19:26 /var/lib/one//datastores/101/fbceb95d4d622e0952297059753a4468

Regards.

datastore-fs-rm.sh Magnifier (3.38 KB) EOLE Team, 09/02/2016 09:09 AM

History

#1 Updated by EOLE Team over 4 years ago

This issue may be related to #2998

#2 Updated by Ruben S. Montero over 4 years ago

In this case, the driver should leave an error message in oned.log:

  • Is there any message starting like?:
 "Error removing image from datastore. Manually remove image source..." 
  • What datastore driver are you using?
  • Is this happening for more than 15 delete operations? (there is a buffer for concurrent image operations set by default to 15)

#3 Updated by EOLE Team over 4 years ago

Ruben S. Montero wrote:

In this case, the driver should leave an error message in oned.log:

  • Is there any message starting like?:
Error removing image from datastore. Manually remove image source..." 

Nothing like this in my entire logs.

  • What datastore driver are you using?
101 image                 4T 26%   -               562 img  fs      qcow2   on
  • Is this happening for more than 15 delete operations? (there is a buffer for concurrent image operations set by default to 15)

I just tried with few images, 6 of them are in DELETE state now (careful IDs are shortened by oneimage):

oneadmin@one-frontend:~$ oneimage list -f STAT=dele
  ID USER       GROUP      NAME            DATASTORE     SIZE TYPE PER STAT RVMS
1124 jenkins    mensr      Windows10.1511- image          40G OS    No dele    0
1270 jenkins    mensr      etb3.amonecole- image          40G OS    No dele    0
1276 jenkins    mensr      etb1.amon-2.6.0 image          40G OS    No dele    0
1279 jenkins    mensr      etb1.scribe-2.6 image          40G OS    No dele    0
1296 jenkins    mensr      etb1.amon-2.5.2 image          40G OS    No dele    0
1320 jenkins    mensr      scribe-2.6.0a7- image          40G OS    No dele    0

Here are the logs around the operation:

Mon Jul 25 10:06:22 2016 [Z0][ReM][D]: Req:6480 UID:2 ImageInfo invoked , 13432
Mon Jul 25 10:06:22 2016 [Z0][ReM][D]: Req:6480 UID:2 ImageInfo result SUCCESS, "<IMAGE><ID>13432</ID..." 
Mon Jul 25 10:06:22 2016 [Z0][ReM][D]: Req:1168 UID:2 ImageInfo invoked , 13435
Mon Jul 25 10:06:22 2016 [Z0][ReM][D]: Req:1168 UID:2 ImageInfo result SUCCESS, "<IMAGE><ID>13435</ID..." 
Mon Jul 25 10:06:22 2016 [Z0][ReM][D]: Req:3440 UID:2 ImageInfo invoked , 13301
Mon Jul 25 10:06:22 2016 [Z0][ReM][D]: Req:9744 UID:2 ImageInfo invoked , 13362
Mon Jul 25 10:06:22 2016 [Z0][ReM][D]: Req:9744 UID:2 ImageInfo result SUCCESS, "<IMAGE><ID>13362</ID..." 
Mon Jul 25 10:06:22 2016 [Z0][ReM][D]: Req:3440 UID:2 ImageInfo result SUCCESS, "<IMAGE><ID>13301</ID..." 
Mon Jul 25 10:06:22 2016 [Z0][ReM][D]: Req:8784 UID:2 ImageDelete invoked , 13435
Mon Jul 25 10:06:22 2016 [Z0][ReM][D]: Req:1488 UID:2 ImageDelete invoked , 13362
Mon Jul 25 10:06:22 2016 [Z0][ReM][D]: Req:6496 UID:2 ImageDelete invoked , 13432
Mon Jul 25 10:06:22 2016 [Z0][ReM][D]: Req:1568 UID:2 ImageInfo invoked , 12216
Mon Jul 25 10:06:22 2016 [Z0][ReM][D]: Req:1568 UID:2 ImageInfo result SUCCESS, "<IMAGE><ID>12216</ID..." 
Mon Jul 25 10:06:22 2016 [Z0][ReM][D]: Req:512 UID:2 ImageDelete invoked , 13301
Mon Jul 25 10:06:22 2016 [Z0][ReM][D]: Req:9824 UID:2 ImageDelete invoked , 12216
Mon Jul 25 10:06:22 2016 [Z0][ReM][D]: Req:8784 UID:2 ImageDelete result SUCCESS, 13435
Mon Jul 25 10:06:22 2016 [Z0][ReM][D]: Req:6496 UID:2 ImageDelete result SUCCESS, 13432
Mon Jul 25 10:06:22 2016 [Z0][ReM][D]: Req:1488 UID:2 ImageDelete result SUCCESS, 13362
Mon Jul 25 10:06:22 2016 [Z0][ReM][D]: Req:512 UID:2 ImageDelete result SUCCESS, 13301
Mon Jul 25 10:06:22 2016 [Z0][ReM][D]: Req:9824 UID:2 ImageDelete result SUCCESS, 12216
Mon Jul 25 10:06:27 2016 [Z0][InM][D]: Monitoring datastore default (1)
Mon Jul 25 10:06:27 2016 [Z0][InM][D]: Monitoring datastore files (2)
Mon Jul 25 10:06:27 2016 [Z0][InM][D]: Monitoring datastore image (101)
Mon Jul 25 10:06:28 2016 [Z0][InM][D]: Monitoring datastore iso (102)
Mon Jul 25 10:06:28 2016 [Z0][InM][D]: Monitoring datastore envole-images (104)
Mon Jul 25 10:06:28 2016 [Z0][InM][D]: Monitoring datastore Public Datastore (105)
Mon Jul 25 10:06:28 2016 [Z0][ImM][D]: Datastore default (1) successfully monitored.
Mon Jul 25 10:06:28 2016 [Z0][ImM][D]: Datastore envole-images (104) successfully monitored.
Mon Jul 25 10:06:28 2016 [Z0][ImM][D]: Datastore Public Datastore (105) successfully monitored.
Mon Jul 25 10:06:28 2016 [Z0][ImM][D]: Datastore iso (102) successfully monitored.
Mon Jul 25 10:06:28 2016 [Z0][ImM][D]: Datastore files (2) successfully monitored.
Mon Jul 25 10:06:28 2016 [Z0][ImM][D]: Datastore image (101) successfully monitored.
Mon Jul 25 10:06:28 2016 [Z0][ImM][I]: Image successfully removed.
Mon Jul 25 10:06:28 2016 [Z0][InM][D]: Monitoring datastore image (101)
Mon Jul 25 10:06:28 2016 [Z0][ImM][D]: Datastore image (101) successfully monitored.
Mon Jul 25 10:06:29 2016 [Z0][ImM][I]: Image successfully removed.
Mon Jul 25 10:06:29 2016 [Z0][InM][D]: Monitoring datastore image (101)
Mon Jul 25 10:06:29 2016 [Z0][ImM][I]: Image successfully removed.
Mon Jul 25 10:06:29 2016 [Z0][InM][D]: Monitoring datastore image (101)
Mon Jul 25 10:06:29 2016 [Z0][ImM][I]: Image successfully removed.
Mon Jul 25 10:06:29 2016 [Z0][InM][D]: Monitoring datastore image (101)
Mon Jul 25 10:06:29 2016 [Z0][ImM][I]: Image successfully removed.
Mon Jul 25 10:06:29 2016 [Z0][InM][D]: Monitoring datastore image (101)
Mon Jul 25 10:06:30 2016 [Z0][ImM][D]: Datastore image (101) successfully monitored.
Mon Jul 25 10:06:30 2016 [Z0][ImM][D]: Datastore image (101) successfully monitored.
Mon Jul 25 10:06:30 2016 [Z0][ImM][D]: Datastore system (0) successfully monitored.
Mon Jul 25 10:06:30 2016 [Z0][ImM][D]: Datastore running (100) successfully monitored.
Mon Jul 25 10:06:30 2016 [Z0][ImM][D]: Datastore envole-running (103) successfully monitored.
Mon Jul 25 10:06:30 2016 [Z0][ImM][D]: Datastore image (101) successfully monitored.

You can see that the 6 images in DELETE state do not have the corresponding ImageDelete invoked , <ID>.

Even 5 minutes after grep -F 'ImageDelete invoked ,11241' /var/log/one/oned.log returns nothing.

Looking around for more logs, I saw in sunstone.log the following:

Mon Jul 25 10:05:43 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:05:43] "GET / HTTP/1.0" 200 - 0.0147
Mon Jul 25 10:05:45 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:05:45] "GET /vnet?timeout=false&csrftoken=f6b90088f62ddca93c320921362827bb HTTP/1.0" 200 - 0.5776
Mon Jul 25 10:05:45 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:05:45] "GET /onedconf?csrftoken=f6b90088f62ddca93c320921362827bb HTTP/1.0" 200 - 0.5676
Mon Jul 25 10:05:45 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:05:45] "GET /image?timeout=false&csrftoken=f6b90088f62ddca93c320921362827bb HTTP/1.0" 200 - 0.7253
Mon Jul 25 10:05:45 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:05:45] "GET /host?timeout=false&csrftoken=f6b90088f62ddca93c320921362827bb HTTP/1.0" 200 - 0.7036
Mon Jul 25 10:05:45 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:05:45] "GET /group?timeout=false&csrftoken=f6b90088f62ddca93c320921362827bb HTTP/1.0" 200 - 0.0224
Mon Jul 25 10:05:45 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:05:45] "GET /user?timeout=false&csrftoken=f6b90088f62ddca93c320921362827bb HTTP/1.0" 200 - 0.0463
Mon Jul 25 10:05:46 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:05:46] "GET /vm?timeout=false&csrftoken=f6b90088f62ddca93c320921362827bb HTTP/1.0" 200 - 1.6494
Mon Jul 25 10:05:56 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:05:56] "GET /image?timeout=false&csrftoken=f6b90088f62ddca93c320921362827bb HTTP/1.0" 200 - 0.4481
Mon Jul 25 10:06:06 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:06:06] "GET /vm/accounting?start_time=1468829144&end_time=-1&csrftoken=f6b90088f62ddca93c320921362827bb HTTP/1.0" 200 - 21.7914
Mon Jul 25 10:06:13 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:06:13] "GET /vm/accounting?start_time=1468829145&end_time=-1&csrftoken=f6b90088f62ddca93c320921362827bb HTTP/1.0" 200 - 27.7984
Mon Jul 25 10:06:22 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:06:22] "DELETE /image/13435 HTTP/1.0" 204 - 0.0618
Mon Jul 25 10:06:22 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:06:22] "DELETE /image/13432 HTTP/1.0" 204 - 0.0665
Mon Jul 25 10:06:22 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:06:22] "DELETE /image/13362 HTTP/1.0" 204 - 0.0696
Mon Jul 25 10:06:22 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:06:22] "DELETE /image/13301 HTTP/1.0" 204 - 0.0726
Mon Jul 25 10:06:22 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:06:22] "DELETE /image/12216 HTTP/1.0" 204 - 0.0783
Mon Jul 25 10:06:48 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:06:48] "GET /image?timeout=false&csrftoken=f6b90088f62ddca93c320921362827bb HTTP/1.0" 200 - 0.4710
Mon Jul 25 10:07:00 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:07:00] "GET /image?timeout=false&csrftoken=f6b90088f62ddca93c320921362827bb HTTP/1.0" 200 - 0.4276
Mon Jul 25 10:11:42 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:11:42] "GET /image?timeout=false&csrftoken=f6b90088f62ddca93c320921362827bb HTTP/1.0" 200 - 0.4235

Only 5 DELETE requests were made, later I tried to delete the last 6 ones:

Mon Jul 25 10:11:42 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:11:42] "GET /image?timeout=false&csrftoken=f6b90088f62ddca93c320921362827bb HTTP/1.0" 200 - 0.4235
Mon Jul 25 10:21:50 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:21:50] "DELETE /image/12792 HTTP/1.0" 204 - 0.1335
Mon Jul 25 10:21:50 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:21:50] "DELETE /image/12765 HTTP/1.0" 204 - 0.1446
Mon Jul 25 10:21:50 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:21:50] "DELETE /image/11241 HTTP/1.0" 204 - 0.1527
Mon Jul 25 10:21:50 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:21:50] "DELETE /image/12709 HTTP/1.0" 204 - 0.1639
Mon Jul 25 10:21:50 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:21:50] "DELETE /image/12962 HTTP/1.0" 204 - 0.1545
Mon Jul 25 10:21:50 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:21:50] "DELETE /image/13203 HTTP/1.0" 204 - 0.1593
Mon Jul 25 10:21:59 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:21:59] "GET /image?timeout=false&csrftoken=f6b90088f62ddca93c320921362827bb HTTP/1.0" 200 - 0.4124
Mon Jul 25 10:22:57 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:22:57] "GET /image?timeout=false&csrftoken=f6b90088f62ddca93c320921362827bb HTTP/1.0" 200 - 0.4676

Regards.

#4 Updated by EOLE Team over 4 years ago

I can reproduce it consistently, here is what I did in Sunstone

  • Create an empty qcow2 disk
  • Clone that disk 5 times
  • Clone the 6 new empty disks 3 times (which results in a set of 6 disks plus 18 disks on which to made some tests)
  • Select the last 18 disks
  • Delete them

In the result, 8 disks are in DELETE state.

Here are the logs:

  • sunstone.log
    Mon Jul 25 10:31:44 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:31:44] "DELETE /image/13564 HTTP/1.0" 204 - 0.0476
    Mon Jul 25 10:31:45 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:31:45] "DELETE /image/13565 HTTP/1.0" 204 - 0.0512
    Mon Jul 25 10:31:45 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:31:45] "DELETE /image/13566 HTTP/1.0" 204 - 0.0529
    Mon Jul 25 10:31:45 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:31:45] "DELETE /image/13567 HTTP/1.0" 204 - 0.0583
    Mon Jul 25 10:31:45 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:31:45] "DELETE /image/13568 HTTP/1.0" 204 - 0.0593
    Mon Jul 25 10:31:45 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:31:45] "DELETE /image/13570 HTTP/1.0" 204 - 0.0464
    Mon Jul 25 10:31:45 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:31:45] "DELETE /image/13572 HTTP/1.0" 204 - 0.0404
    Mon Jul 25 10:31:45 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:31:45] "DELETE /image/13569 HTTP/1.0" 204 - 0.0884
    Mon Jul 25 10:31:45 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:31:45] "DELETE /image/13575 HTTP/1.0" 204 - 0.1380
    Mon Jul 25 10:31:45 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:31:45] "DELETE /image/13577 HTTP/1.0" 204 - 0.1207
    Mon Jul 25 10:31:45 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:31:45] "DELETE /image/13573 HTTP/1.0" 204 - 0.3095
    Mon Jul 25 10:31:45 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:31:45] "DELETE /image/13574 HTTP/1.0" 204 - 0.3066
    Mon Jul 25 10:31:45 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:31:45] "DELETE /image/13571 HTTP/1.0" 204 - 0.3349
    Mon Jul 25 10:31:45 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:31:45] "DELETE /image/13576 HTTP/1.0" 204 - 0.2858
    Mon Jul 25 10:31:45 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:31:45] "DELETE /image/13578 HTTP/1.0" 204 - 0.1656
    Mon Jul 25 10:31:45 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:31:45] "DELETE /image/13579 HTTP/1.0" 204 - 0.1664
    Mon Jul 25 10:31:45 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:31:45] "DELETE /image/13581 HTTP/1.0" 204 - 0.1513
    Mon Jul 25 10:31:45 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:31:45] "DELETE /image/13580 HTTP/1.0" 204 - 0.1652
    Mon Jul 25 10:32:02 2016 [I]: 192.168.230.30 - - [25/Jul/2016 10:32:02] "GET /image?timeout=false&csrftoken=f6b90088f62ddca93c320921362827bb HTTP/1.0" 200 - 0.3934
    
  • oned.log
    Mon Jul 25 10:31:44 2016 [Z0][ReM][D]: Req:7296 UID:2 ImageInfo invoked , 13564
    Mon Jul 25 10:31:44 2016 [Z0][ReM][D]: Req:7296 UID:2 ImageInfo result SUCCESS, "<IMAGE><ID>13564</ID..." 
    Mon Jul 25 10:31:44 2016 [Z0][ReM][D]: Req:3008 UID:2 ImageInfo invoked , 13565
    Mon Jul 25 10:31:44 2016 [Z0][ReM][D]: Req:3008 UID:2 ImageInfo result SUCCESS, "<IMAGE><ID>13565</ID..." 
    Mon Jul 25 10:31:44 2016 [Z0][ReM][D]: Req:6912 UID:2 ImageInfo invoked , 13566
    Mon Jul 25 10:31:44 2016 [Z0][ReM][D]: Req:6912 UID:2 ImageInfo result SUCCESS, "<IMAGE><ID>13566</ID..." 
    Mon Jul 25 10:31:44 2016 [Z0][ReM][D]: Req:6640 UID:2 ImageDelete invoked , 13564
    Mon Jul 25 10:31:44 2016 [Z0][ReM][D]: Req:4064 UID:2 ImageInfo invoked , 13567
    Mon Jul 25 10:31:44 2016 [Z0][ReM][D]: Req:5504 UID:2 ImageDelete invoked , 13566
    Mon Jul 25 10:31:44 2016 [Z0][ReM][D]: Req:4064 UID:2 ImageInfo result SUCCESS, "<IMAGE><ID>13567</ID..." 
    Mon Jul 25 10:31:44 2016 [Z0][ReM][D]: Req:3248 UID:2 ImageDelete invoked , 13565
    Mon Jul 25 10:31:44 2016 [Z0][ReM][D]: Req:64 UID:2 ImageInfo invoked , 13568
    Mon Jul 25 10:31:44 2016 [Z0][ReM][D]: Req:64 UID:2 ImageInfo result SUCCESS, "<IMAGE><ID>13568</ID..." 
    Mon Jul 25 10:31:44 2016 [Z0][ReM][D]: Req:7632 UID:2 ImageDelete invoked , 13567
    Mon Jul 25 10:31:44 2016 [Z0][ReM][D]: Req:5568 UID:2 ImageDelete invoked , 13568
    Mon Jul 25 10:31:44 2016 [Z0][ReM][D]: Req:6640 UID:2 ImageDelete result SUCCESS, 13564
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:3248 UID:2 ImageDelete result SUCCESS, 13565
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:5200 UID:2 ImageInfo invoked , 13569
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:3856 UID:2 ImageInfo invoked , 13570
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:5200 UID:2 ImageInfo result SUCCESS, "<IMAGE><ID>13569</ID..." 
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:3856 UID:2 ImageInfo result SUCCESS, "<IMAGE><ID>13570</ID..." 
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:5504 UID:2 ImageDelete result SUCCESS, 13566
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:9424 UID:2 ImageInfo invoked , 13571
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:2048 UID:2 ImageDelete invoked , 13570
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:9424 UID:2 ImageInfo result SUCCESS, "<IMAGE><ID>13571</ID..." 
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:8352 UID:2 ImageDelete invoked , 13569
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:208 UID:2 ImageInfo invoked , 13572
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:208 UID:2 ImageInfo result SUCCESS, "<IMAGE><ID>13572</ID..." 
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:7632 UID:2 ImageDelete result SUCCESS, 13567
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:9248 UID:2 ImageDelete invoked , 13572
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:7312 UID:2 ImageDelete invoked , 13571
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:5568 UID:2 ImageDelete result SUCCESS, 13568
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:3968 UID:2 ImageInfo invoked , 13573
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:3968 UID:2 ImageInfo result SUCCESS, "<IMAGE><ID>13573</ID..." 
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:7552 UID:2 ImageDelete invoked , 13573
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:2048 UID:2 ImageDelete result SUCCESS, 13570
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:3872 UID:2 ImageInfo invoked , 13574
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:5856 UID:2 ImageInfo invoked , 13575
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:3872 UID:2 ImageInfo result SUCCESS, "<IMAGE><ID>13574</ID..." 
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:5856 UID:2 ImageInfo result SUCCESS, "<IMAGE><ID>13575</ID..." 
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:6256 UID:2 ImageDelete invoked , 13574
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:9248 UID:2 ImageDelete result SUCCESS, 13572
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:2416 UID:2 ImageDelete invoked , 13575
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:9136 UID:2 ImageInfo invoked , 13576
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:9136 UID:2 ImageInfo result SUCCESS, "<IMAGE><ID>13576</ID..." 
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:8352 UID:2 ImageDelete result SUCCESS, 13569
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:2496 UID:2 ImageDelete invoked , 13576
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:752 UID:2 ImageInfo invoked , 13577
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:752 UID:2 ImageInfo result SUCCESS, "<IMAGE><ID>13577</ID..." 
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:192 UID:2 ImageDelete invoked , 13577
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:2416 UID:2 ImageDelete result SUCCESS, 13575
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:192 UID:2 ImageDelete result SUCCESS, 13577
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:7440 UID:2 ImageInfo invoked , 13578
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:7440 UID:2 ImageInfo result SUCCESS, "<IMAGE><ID>13578</ID..." 
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:7776 UID:2 ImageInfo invoked , 13579
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:7776 UID:2 ImageInfo result SUCCESS, "<IMAGE><ID>13579</ID..." 
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:5232 UID:2 ImageDelete invoked , 13579
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:176 UID:2 ImageDelete invoked , 13578
    Mon Jul 25 10:31:45 2016 [Z0][ImM][I]: Image successfully removed.
    Mon Jul 25 10:31:45 2016 [Z0][InM][D]: Monitoring datastore image (101)
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:7552 UID:2 ImageDelete result SUCCESS, 13573
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:6256 UID:2 ImageDelete result SUCCESS, 13574
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:7312 UID:2 ImageDelete result SUCCESS, 13571
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:2496 UID:2 ImageDelete result SUCCESS, 13576
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:176 UID:2 ImageDelete result SUCCESS, 13578
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:3136 UID:2 ImageInfo invoked , 13580
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:4064 UID:2 ImageInfo invoked , 13581
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:3136 UID:2 ImageInfo result SUCCESS, "<IMAGE><ID>13580</ID..." 
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:4064 UID:2 ImageInfo result SUCCESS, "<IMAGE><ID>13581</ID..." 
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:6880 UID:2 ImageDelete invoked , 13580
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:5232 UID:2 ImageDelete result SUCCESS, 13579
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:5264 UID:2 ImageDelete invoked , 13581
    Mon Jul 25 10:31:45 2016 [Z0][ImM][I]: Image successfully removed.
    Mon Jul 25 10:31:45 2016 [Z0][InM][D]: Monitoring datastore image (101)
    Mon Jul 25 10:31:45 2016 [Z0][ImM][I]: Image successfully removed.
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:5264 UID:2 ImageDelete result SUCCESS, 13581
    Mon Jul 25 10:31:45 2016 [Z0][InM][D]: Monitoring datastore image (101)
    Mon Jul 25 10:31:45 2016 [Z0][ImM][I]: Image successfully removed.
    Mon Jul 25 10:31:45 2016 [Z0][ReM][D]: Req:6880 UID:2 ImageDelete result SUCCESS, 13580
    Mon Jul 25 10:31:45 2016 [Z0][InM][D]: Monitoring datastore image (101)
    Mon Jul 25 10:31:45 2016 [Z0][ImM][I]: Image successfully removed.
    Mon Jul 25 10:31:45 2016 [Z0][InM][D]: Monitoring datastore image (101)
    Mon Jul 25 10:31:45 2016 [Z0][ImM][I]: Image successfully removed.
    Mon Jul 25 10:31:45 2016 [Z0][InM][D]: Monitoring datastore image (101)
    Mon Jul 25 10:31:45 2016 [Z0][ImM][I]: Image successfully removed.
    Mon Jul 25 10:31:45 2016 [Z0][InM][D]: Monitoring datastore image (101)
    Mon Jul 25 10:31:45 2016 [Z0][ImM][I]: Image successfully removed.
    Mon Jul 25 10:31:45 2016 [Z0][InM][D]: Monitoring datastore image (101)
    Mon Jul 25 10:31:45 2016 [Z0][ImM][I]: Image successfully removed.
    Mon Jul 25 10:31:45 2016 [Z0][InM][D]: Monitoring datastore image (101)
    Mon Jul 25 10:31:45 2016 [Z0][ImM][I]: Image successfully removed.
    Mon Jul 25 10:31:45 2016 [Z0][InM][D]: Monitoring datastore image (101)
    Mon Jul 25 10:31:45 2016 [Z0][ImM][D]: Datastore image (101) successfully monitored.
    Mon Jul 25 10:31:45 2016 [Z0][ImM][D]: Datastore image (101) successfully monitored.
    Mon Jul 25 10:31:45 2016 [Z0][ImM][D]: Datastore image (101) successfully monitored.
    Mon Jul 25 10:31:45 2016 [Z0][ImM][D]: Datastore image (101) successfully monitored.
    Mon Jul 25 10:31:46 2016 [Z0][ImM][D]: Datastore image (101) successfully monitored.
    Mon Jul 25 10:31:46 2016 [Z0][ImM][D]: Datastore image (101) successfully monitored.
    Mon Jul 25 10:31:46 2016 [Z0][ImM][D]: Datastore image (101) successfully monitored.
    Mon Jul 25 10:31:46 2016 [Z0][ImM][D]: Datastore image (101) successfully monitored.
    Mon Jul 25 10:31:46 2016 [Z0][ImM][D]: Datastore image (101) successfully monitored.
    Mon Jul 25 10:31:47 2016 [Z0][ImM][D]: Datastore system (0) successfully monitored.
    Mon Jul 25 10:31:47 2016 [Z0][ImM][D]: Datastore running (100) successfully monitored.
    Mon Jul 25 10:31:47 2016 [Z0][ImM][D]: Datastore envole-running (103) successfully monitored.
    

Then I tried from the CLI:

oneadmin@one-frontend:~$ oneimage delete 13601..13616

and only the image 13616 is in DELETE state without logs of like ImageDelete invoked , 13616.

Regards.

#5 Updated by EOLE Team over 4 years ago

I don't reproduce it on my 5.0.2-2 but this cluster is not as loaded as my 4.14.

Regards.

#6 Updated by EOLE Team over 4 years ago

EOLE Team wrote:

I don't reproduce it on my 5.0.2-2 but this cluster is not as loaded as my 4.14.

After upgrading my 4.14 to 5.0.2 I can reproduce the same issue, 25 disks stay in DELETE state without being removed.

Regards.

#7 Updated by Ruben S. Montero about 4 years ago

  • Assignee set to Carlos Martín
  • Target version set to Release 5.2

#8 Updated by Ruben S. Montero about 4 years ago

Hi

We've been reviewing the logic and the only reason for the image to stuck in DELETE is because of the drivers does not return anything for the delete operation. So, probably at some point the fs/rm operation is stuck (NFS issue here?). When an Image is in delete state you can retry the delete operation.

Can you look for any blocked process (fs/rm) in your system?

We think that the safer approach is to include logic in the driver to timeout (if the are actually getting stuck, we need to confirm this). Also, if the rm action is failing for some reason we need to further debug this, the rm script is pretty simple.

#9 Updated by EOLE Team about 4 years ago

Ruben S. Montero wrote:

Hi

We've been reviewing the logic and the only reason for the image to stuck in DELETE is because of the drivers does not return anything for the delete operation. So, probably at some point the fs/rm operation is stuck (NFS issue here?). When an Image is in delete state you can retry the delete operation.

Can you look for any blocked process (fs/rm) in your system?

No rm process blocked on my frontend.

We think that the safer approach is to include logic in the driver to timeout (if the are actually getting stuck, we need to confirm this). Also, if the rm action is failing for some reason we need to further debug this, the rm script is pretty simple.

I'll redo the test after adding set -x to the top of remotes/datastore/fs/rm.

Regards.

#10 Updated by EOLE Team about 4 years ago

EOLE Team wrote:

We think that the safer approach is to include logic in the driver to timeout (if the are actually getting stuck, we need to confirm this). Also, if the rm action is failing for some reason we need to further debug this, the rm script is pretty simple.

I'll redo the test after adding set -x to the top of remotes/datastore/fs/rm.

I tried today with 25 images, 6 of them are kept in DELETE state.

I have no log in /var/log/one/one_datastore.log and I don't see any output corresponding to my set -x at the top of remotes/datastore/fs/rm.

How could I provide more information?

Regards.

#11 Updated by Anton Todorov about 4 years ago

Hi,

Are you sure that there are no bash processes calling the delete script hanging around? Recently I hit an limitation when using `set -x` - it filled the stderr buffer and the script process hang in `write(2,"string:...)`.

I would like to suggest to add logging to syslog in the script. Something like:

logger -t one_ds_delete "$ID whatever..." 

Kind Rregards
Anton Todorov

#12 Updated by EOLE Team about 4 years ago

Anton Todorov wrote:

Hi,

Are you sure that there are no bash processes calling the delete script hanging around? Recently I hit an limitation when using `set -x` - it filled the stderr buffer and the script process hang in `write(2,"string:...)`.

I would like to suggest to add logging to syslog in the script. Something like:

Ok, I selected 24 images and I only have 22 lines in my log:

Sep  2 09:59:04 one-frontend one-ds-delete: rm -rf /var/lib/one//datastores/101/a5697429ba941302e40f89bbccfc7057 /var/lib/one//datastores/101/a5697429ba941302e40f89bbccfc7057.snap
Sep  2 09:59:05 one-frontend one-ds-delete: rm -rf /var/lib/one//datastores/101/66fd350223f146b9f6929097f32b4f47 /var/lib/one//datastores/101/66fd350223f146b9f6929097f32b4f47.snap
Sep  2 09:59:05 one-frontend one-ds-delete: rm -rf /var/lib/one//datastores/101/5cdc3d46aa4da008cf0c4aff41d6c65f /var/lib/one//datastores/101/5cdc3d46aa4da008cf0c4aff41d6c65f.snap
Sep  2 09:59:05 one-frontend one-ds-delete: rm -rf /var/lib/one//datastores/101/944db7be2218752e177d21b770a0b339 /var/lib/one//datastores/101/944db7be2218752e177d21b770a0b339.snap
Sep  2 09:59:05 one-frontend one-ds-delete: rm -rf /var/lib/one//datastores/101/050ba34935a7bec256ba7832afce0d3f /var/lib/one//datastores/101/050ba34935a7bec256ba7832afce0d3f.snap
Sep  2 09:59:05 one-frontend one-ds-delete: rm -rf /var/lib/one//datastores/101/1d6571cd30e1ade14f1e20cc224671fa /var/lib/one//datastores/101/1d6571cd30e1ade14f1e20cc224671fa.snap
Sep  2 09:59:05 one-frontend one-ds-delete: rm -rf /var/lib/one//datastores/101/9d7fbba13e8d3b19111b24b09a260308 /var/lib/one//datastores/101/9d7fbba13e8d3b19111b24b09a260308.snap
Sep  2 09:59:05 one-frontend one-ds-delete: rm -rf /var/lib/one//datastores/101/59f58db143a8660f46c50fd8466ba2d4 /var/lib/one//datastores/101/59f58db143a8660f46c50fd8466ba2d4.snap
Sep  2 09:59:05 one-frontend one-ds-delete: rm -rf /var/lib/one//datastores/101/e02a17cb6659c4c97c66fd5daacd9004 /var/lib/one//datastores/101/e02a17cb6659c4c97c66fd5daacd9004.snap
Sep  2 09:59:05 one-frontend one-ds-delete: rm -rf /var/lib/one//datastores/101/e8b54f49c2639b77343dcb800a0d7fcb /var/lib/one//datastores/101/e8b54f49c2639b77343dcb800a0d7fcb.snap
Sep  2 09:59:05 one-frontend one-ds-delete: rm -rf /var/lib/one//datastores/101/399ecad704469f444f6ff7bddc02c29a /var/lib/one//datastores/101/399ecad704469f444f6ff7bddc02c29a.snap
Sep  2 09:59:05 one-frontend one-ds-delete: rm -rf /var/lib/one//datastores/101/8051e6125f9973b26fb4cc108b7ab588 /var/lib/one//datastores/101/8051e6125f9973b26fb4cc108b7ab588.snap
Sep  2 09:59:06 one-frontend one-ds-delete: rm -rf /var/lib/one//datastores/101/1e3f15646f8345e009f29ce97081af77 /var/lib/one//datastores/101/1e3f15646f8345e009f29ce97081af77.snap
Sep  2 09:59:06 one-frontend one-ds-delete: rm -rf /var/lib/one//datastores/101/9af97cbf459c32a77ddf5dc4174a7c97 /var/lib/one//datastores/101/9af97cbf459c32a77ddf5dc4174a7c97.snap
Sep  2 09:59:06 one-frontend one-ds-delete: rm -rf /var/lib/one//datastores/101/56bf063d3be305270f3eabd0823e89de /var/lib/one//datastores/101/56bf063d3be305270f3eabd0823e89de.snap
Sep  2 09:59:06 one-frontend one-ds-delete: rm -rf /var/lib/one//datastores/101/a28b149ac932c08b9c45b378c055b0e3 /var/lib/one//datastores/101/a28b149ac932c08b9c45b378c055b0e3.snap
Sep  2 09:59:06 one-frontend one-ds-delete: rm -rf /var/lib/one//datastores/101/e1c58914270869a7a5405526ffbfcba6 /var/lib/one//datastores/101/e1c58914270869a7a5405526ffbfcba6.snap
Sep  2 09:59:06 one-frontend one-ds-delete: rm -rf /var/lib/one//datastores/101/fd7138998a16527f245828eeea9bf0c7 /var/lib/one//datastores/101/fd7138998a16527f245828eeea9bf0c7.snap
Sep  2 09:59:06 one-frontend one-ds-delete: rm -rf /var/lib/one//datastores/101/c4ea7eea561be5276a1f46be7a12e531 /var/lib/one//datastores/101/c4ea7eea561be5276a1f46be7a12e531.snap
Sep  2 09:59:06 one-frontend one-ds-delete: rm -rf /var/lib/one//datastores/101/6d61bf1008d08a6a345567674353d402 /var/lib/one//datastores/101/6d61bf1008d08a6a345567674353d402.snap
Sep  2 09:59:06 one-frontend one-ds-delete: rm -rf /var/lib/one//datastores/101/855eaebd6bd86b73952eb7195ae06521 /var/lib/one//datastores/101/855eaebd6bd86b73952eb7195ae06521.snap
Sep  2 09:59:06 one-frontend one-ds-delete: rm -rf /var/lib/one//datastores/101/80063547d0995d198260f43a092b0ac4 /var/lib/one//datastores/101/80063547d0995d198260f43a092b0ac4.snap

2 images are in DELETE state.

Looking at Sunstone logs, I have 24 DELETE from my IP at the time I did my last test

oneadmin@one-frontend:~$ grep -a '09:59:.*192\.168\.230\.30.*DELETE' /var/log/one/sunstone.log
Fri Sep 02 09:59:04 2016 [I]: 192.168.230.30 - - [02/Sep/2016:09:59:04 +0200] "DELETE /image/14203 HTTP/1.0" 204 - 0.0291
Fri Sep 02 09:59:04 2016 [I]: 192.168.230.30 - - [02/Sep/2016:09:59:04 +0200] "DELETE /image/14202 HTTP/1.0" 204 - 0.0228
Fri Sep 02 09:59:04 2016 [I]: 192.168.230.30 - - [02/Sep/2016:09:59:04 +0200] "DELETE /image/14201 HTTP/1.0" 204 - 0.0191
Fri Sep 02 09:59:04 2016 [I]: 192.168.230.30 - - [02/Sep/2016:09:59:04 +0200] "DELETE /image/14200 HTTP/1.0" 204 - 0.0238
Fri Sep 02 09:59:04 2016 [I]: 192.168.230.30 - - [02/Sep/2016:09:59:04 +0200] "DELETE /image/14206 HTTP/1.0" 204 - 0.1499
Fri Sep 02 09:59:04 2016 [I]: 192.168.230.30 - - [02/Sep/2016:09:59:04 +0200] "DELETE /image/14198 HTTP/1.0" 204 - 0.0275
Fri Sep 02 09:59:04 2016 [I]: 192.168.230.30 - - [02/Sep/2016:09:59:04 +0200] "DELETE /image/14204 HTTP/1.0" 204 - 0.1692
Fri Sep 02 09:59:04 2016 [I]: 192.168.230.30 - - [02/Sep/2016:09:59:04 +0200] "DELETE /image/14196 HTTP/1.0" 204 - 0.0222
Fri Sep 02 09:59:04 2016 [I]: 192.168.230.30 - - [02/Sep/2016:09:59:04 +0200] "DELETE /image/14197 HTTP/1.0" 204 - 0.0317
Fri Sep 02 09:59:04 2016 [I]: 192.168.230.30 - - [02/Sep/2016:09:59:04 +0200] "DELETE /image/14195 HTTP/1.0" 204 - 0.0212
Fri Sep 02 09:59:04 2016 [I]: 192.168.230.30 - - [02/Sep/2016:09:59:04 +0200] "DELETE /image/14194 HTTP/1.0" 204 - 0.0234
Fri Sep 02 09:59:04 2016 [I]: 192.168.230.30 - - [02/Sep/2016:09:59:04 +0200] "DELETE /image/14193 HTTP/1.0" 204 - 0.0195
Fri Sep 02 09:59:04 2016 [I]: 192.168.230.30 - - [02/Sep/2016:09:59:04 +0200] "DELETE /image/14192 HTTP/1.0" 204 - 0.0332
Fri Sep 02 09:59:04 2016 [I]: 192.168.230.30 - - [02/Sep/2016:09:59:04 +0200] "DELETE /image/14191 HTTP/1.0" 204 - 0.0332
Fri Sep 02 09:59:04 2016 [I]: 192.168.230.30 - - [02/Sep/2016:09:59:04 +0200] "DELETE /image/14199 HTTP/1.0" 204 - 0.2289
Fri Sep 02 09:59:04 2016 [I]: 192.168.230.30 - - [02/Sep/2016:09:59:04 +0200] "DELETE /image/14205 HTTP/1.0" 204 - 0.3656
Fri Sep 02 09:59:04 2016 [I]: 192.168.230.30 - - [02/Sep/2016:09:59:04 +0200] "DELETE /image/14187 HTTP/1.0" 204 - 0.0244
Fri Sep 02 09:59:04 2016 [I]: 192.168.230.30 - - [02/Sep/2016:09:59:04 +0200] "DELETE /image/14186 HTTP/1.0" 204 - 0.0194
Fri Sep 02 09:59:04 2016 [I]: 192.168.230.30 - - [02/Sep/2016:09:59:04 +0200] "DELETE /image/14185 HTTP/1.0" 204 - 0.0185
Fri Sep 02 09:59:04 2016 [I]: 192.168.230.30 - - [02/Sep/2016:09:59:04 +0200] "DELETE /image/14190 HTTP/1.0" 204 - 0.1934
Fri Sep 02 09:59:05 2016 [I]: 192.168.230.30 - - [02/Sep/2016:09:59:05 +0200] "DELETE /image/14183 HTTP/1.0" 204 - 0.0428
Fri Sep 02 09:59:05 2016 [I]: 192.168.230.30 - - [02/Sep/2016:09:59:05 +0200] "DELETE /image/14189 HTTP/1.0" 204 - 0.2634
Fri Sep 02 09:59:05 2016 [I]: 192.168.230.30 - - [02/Sep/2016:09:59:05 +0200] "DELETE /image/14188 HTTP/1.0" 204 - 0.3511
Fri Sep 02 09:59:05 2016 [I]: 192.168.230.30 - - [02/Sep/2016:09:59:05 +0200] "DELETE /image/14184 HTTP/1.0" 204 - 0.3351

I have 24 ImageDelete invoked in /var/log/oned.log too:

Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:5840 UID:2 ImageDelete invoked , 14203
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:7968 UID:2 ImageDelete invoked , 14202
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:6288 UID:2 ImageDelete invoked , 14201
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:7920 UID:2 ImageDelete invoked , 14200
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:5808 UID:2 ImageDelete invoked , 14206
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:112 UID:2 ImageDelete invoked , 14198
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:3456 UID:2 ImageDelete invoked , 14204
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:8800 UID:2 ImageDelete invoked , 14196
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:8368 UID:2 ImageDelete invoked , 14197
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:4912 UID:2 ImageDelete invoked , 14195
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:8400 UID:2 ImageDelete invoked , 14194
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:2448 UID:2 ImageDelete invoked , 14193
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:5728 UID:2 ImageDelete invoked , 14192
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:9840 UID:2 ImageDelete invoked , 14191
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:8704 UID:2 ImageDelete invoked , 14199
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:7904 UID:2 ImageDelete invoked , 14205
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:528 UID:2 ImageDelete invoked , 14187
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:4848 UID:2 ImageDelete invoked , 14186
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:2624 UID:2 ImageDelete invoked , 14185
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:720 UID:2 ImageDelete invoked , 14190
Fri Sep  2 09:59:05 2016 [Z0][ReM][D]: Req:4208 UID:2 ImageDelete invoked , 14183
Fri Sep  2 09:59:05 2016 [Z0][ReM][D]: Req:2192 UID:2 ImageDelete invoked , 14189
Fri Sep  2 09:59:05 2016 [Z0][ReM][D]: Req:5920 UID:2 ImageDelete invoked , 14188
Fri Sep  2 09:59:05 2016 [Z0][ReM][D]: Req:384 UID:2 ImageDelete invoked , 14184

and 24 ImageDelete result SUCCESS in /var/log/one/oned.log:

oneadmin@one-frontend:~$ grep -a '09:59.*ImageDelete result' /var/log/one/oned.log
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:5840 UID:2 ImageDelete result SUCCESS, 14203
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:7968 UID:2 ImageDelete result SUCCESS, 14202
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:6288 UID:2 ImageDelete result SUCCESS, 14201
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:7920 UID:2 ImageDelete result SUCCESS, 14200
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:5808 UID:2 ImageDelete result SUCCESS, 14206
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:112 UID:2 ImageDelete result SUCCESS, 14198
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:3456 UID:2 ImageDelete result SUCCESS, 14204
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:8800 UID:2 ImageDelete result SUCCESS, 14196
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:8368 UID:2 ImageDelete result SUCCESS, 14197
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:4912 UID:2 ImageDelete result SUCCESS, 14195
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:8400 UID:2 ImageDelete result SUCCESS, 14194
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:2448 UID:2 ImageDelete result SUCCESS, 14193
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:5728 UID:2 ImageDelete result SUCCESS, 14192
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:9840 UID:2 ImageDelete result SUCCESS, 14191
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:8704 UID:2 ImageDelete result SUCCESS, 14199
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:7904 UID:2 ImageDelete result SUCCESS, 14205
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:528 UID:2 ImageDelete result SUCCESS, 14187
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:4848 UID:2 ImageDelete result SUCCESS, 14186
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:2624 UID:2 ImageDelete result SUCCESS, 14185
Fri Sep  2 09:59:04 2016 [Z0][ReM][D]: Req:720 UID:2 ImageDelete result SUCCESS, 14190
Fri Sep  2 09:59:05 2016 [Z0][ReM][D]: Req:4208 UID:2 ImageDelete result SUCCESS, 14183
Fri Sep  2 09:59:05 2016 [Z0][ReM][D]: Req:2192 UID:2 ImageDelete result SUCCESS, 14189
Fri Sep  2 09:59:05 2016 [Z0][ReM][D]: Req:5920 UID:2 ImageDelete result SUCCESS, 14188
Fri Sep  2 09:59:05 2016 [Z0][ReM][D]: Req:384 UID:2 ImageDelete result SUCCESS, 14184

Strangely, the log "Removing $SRC from the image repository" (source:src/datastore_mad/remotes/fs/rm@e1f6dee1#L68) provide no log at all, my logger -t one-ds-delete is just after that line.

We may have an issue with logging, so I'll redo the test with logger call at all branches in the script.

#13 Updated by EOLE Team about 4 years ago

I made another test with more logging (datastore-fs-rm.sh) on 48 images, here are the results:

  • Sunstone logs 48 requests:
    oneadmin@one-frontend:~$ grep -ac '11:03.*192\.168\.230\.30.*DELETE' /var/log/one/sunstone.log 
    48
    
  • oned logs 48 ImageDelete invoked:
    oneadmin@one-frontend:~$ grep -ac '11:03.*ImageDelete invoked' /var/log/one/oned.log
    48
    
  • oned logs 48 ImageDelete result
    oneadmin@one-frontend:~$ grep -ac '11:03.*ImageDelete result' /var/log/one/oned.log
    48
    
  • all ImageDelete result are SUCCESS
    oneadmin@one-frontend:~$ grep -ac '11:03.*ImageDelete result SUCCESS' /var/log/one/oned.log
    48
    
  • the rm script was invoked only 37 times:
    oneadmin@one-frontend:~$ grep -c 'one-ds-delete: Start' /var/log/one/one-ds-delete.log
    37
    

So, something is happening between oned and the invocation of the script.

Regards.

#14 Updated by EOLE Team about 4 years ago

EOLE Team wrote:

I made another test with more logging (datastore-fs-rm.sh) on 48 images, here are the results:

  • Sunstone logs 48 requests:
    [...]
  • oned logs 48 ImageDelete invoked:
    [...]
  • oned logs 48 ImageDelete result
    [...]
  • all ImageDelete result are SUCCESS
    [...]
  • the rm script was invoked only 37 times:
    [...]

I forgot to note that 11 images are in DELETE state.

Regards.

#15 Updated by Ruben S. Montero about 4 years ago

Hi,

Can we check if the ruby controller is not losing actions? You may try to increase the number of threads for the driver and let's see if that helps, in oned.conf

DATAST<ORE_MAD = [
    EXECUTABLE = "one_datastore",
    ARGUMENTS  = "-t 15 -d dummy,fs,lvm,ceph,dev,iscsi_libvirt,vcenter -s shared,ssh,ceph,fs_lvm,qcow2" 
]

change -t 15 to -t 30 for example, it should be enough for 20-30 images deletes

Thanks

#16 Updated by Ruben S. Montero about 4 years ago

As a side note, the "ImageDelete SUCCESS" you are seeing are for the API call, i.e. checks passed (image exists, requester has rights to delete the image, the image is not in use...) and the rm operation has started (i.e. the driver has been called). It does not imply that the image is actually deleted, as it is an asynch operation

#17 Updated by EOLE Team about 4 years ago

Ruben S. Montero wrote:

Hi,

Can we check if the ruby controller is not losing actions? You may try to increase the number of threads for the driver and let's see if that helps, in oned.conf
[...]

change -t 15 to -t 30 for example, it should be enough for 20-30 images deletes

I put -t 50 and then restarted oned.

I have 20 images in DELETE state, with 28 invocations of remote/datastores/fs/rm.

Regards.

#18 Updated by Ruben S. Montero about 4 years ago

So this is not the driver getting stuck, I can only think of something happing when removing the files. Can we try to directly remove the files from console (in a set of concurrent operations like rm file file.snap& and so on? Is the storage a disk or it is mounted from somewhere?

#19 Updated by EOLE Team about 4 years ago

Ruben S. Montero wrote:

So this is not the driver getting stuck, I can only think of something happing when removing the files.

But the logger -t one-ds-delete "Start datastore RM script $1 $2" at the top of the script should have produce a line even if something failed after.

Can we try to directly remove the files from console (in a set of concurrent operations like rm file file.snap& and so on? Is the storage a disk or it is mounted from somewhere?

Our datastore is on a SAN exported as NFS from hypervisor to the frontend VM, the NFS server is the hypervisor where the frontend VM runs (using pacemaker location affinity).

Here is the try of the concurrent remove:

oneadmin@one-frontend:~$ for img in {14477..14524}; do oneimage show ${img} | awk '/^SOURCE/ {print $3}'; done > files.txt
oneadmin@one-frontend:~$ while read img; do echo "rm -rf ${img} ${img}.snap &"; done > rm_script.sh < files.txt
oneadmin@one-frontend:~$ sh -x ./rm_script.sh
oneadmin@one-frontend:~$ while read img; do ls ${img}; done

All 48 files are removed from the datastore.

Regards.

#20 Updated by Ruben S. Montero about 4 years ago

  • Target version deleted (Release 5.2)

#21 Updated by EOLE Team about 4 years ago

I made a little script to find out image problems, here is the result:

oneadmin@one-frontend:~$ ./tools/delete-inexistant
INFO: delete-inexistant: Could remove inexistent file: 14138 (DELETE) etb3.amonecole-2.6.0b3-instance-default-amd64.vm -> /var/lib/one//datastores/101/e01e6ce147ca811f572255dd348617ca
INFO: delete-inexistant: Could remove inexistent file: 14657 (DELETE) aca.seshat-2.6.0b3-instance-default-amd64.vm -> /var/lib/one//datastores/101/b467cd11f7204a0832f0a7c55c6b0179

Which means that the file is deleted on the filesystem but the deletion is not ACK in the database.

Regards.

Also available in: Atom PDF