We moved this page to our Documentation Portal. You can find the latest updates here. |
Question
What are zombie disks and how do I remove them?
Environment
Integrated Storage
Answer
Zombie disks are disks that do not show up in the OnAPP database.
Typically they are snapshots taken of a VM's disk, which OnAPP uses to make backups.
They become zombie disks after a backup process completes or fails, and the system cannot remove them.
To clean up the Integrated Storage snapshot (zombie disk):
1. Try the "Some zombie disks found clean up option (Delete ALL)" in the Disk Health section:
https://yourwebsite.com/storage/3/health_checks#
2. If that fails, backend cleanup will be needed:
This can be done from any compute resource or backup server in the zone.
- Look up the zombie disk info:
onappstore diskinfo uuid=(zombie disk identifier here) readable=true
and look for the "parent" line:
parent = xxxxxxxxxxxx
If the disk has the parent parameter, then it is a backup snapshot (zombied disk), and it is safe to remove it.
- Once you have found the snapshot identifier, you would want to turn it off:
onappstore offline uuid=(zombie disk identifier here)
Turning off the disk may fail and give you the reason why.
Sample error:
[root@0.0.0.00 ~]# onappstore offline uuid=xxxxxxxxxxx
result=FAILURE error=onappstore offlineVDisk xxxxxxxxx failed on frontend 2956989790 with error map: [] and optional error: API call failed for a subset of nodes. Failures: [('2956989790', u'Failed to detect device mapper node [/dev/mapper/4msa57bjlxkciz] disappear. out:, err:None')] completion_time=51
In most cases, when it fails, you will find the solution in this document:
https://help.onapp.com/hc/en-us/articles/222049608--dmsetup-remove-Failure-real-errors-
90% of the times, it fails to offline the disk because of the stale mount, device mappers, or stuck processes, which can be found and removed following the link provided.
Caution: if the stuck process has a /bdev in the process line, it is best to open a support ticket.
In the above error log, it actually gives you the identifier of the compute resource/backup server, where the issue is: frontend 2956989790
To find which resource this is, run the following command on all compute resources/backup servers:
onappstore getid | grep (frontend identifier)
Example: onappstore getid | grep 2956989790
This will out put something like this on the correct resource:
>unicasthosts= backends= ipaddr=10.10.10.10 result=SUCCESS uuid=xxxxxxxxxx completion_time=0
Then, you need to ssh to that resource, or if you are on the same resource, you will need to check for the stuck processes, device mappers or mounts, as described in the document:
https://help.onapp.com/hc/en-us/articles/222049608--dmsetup-remove-Failure-real-errors-
3. Once you believe you have cleaned up all the stuck processes keeping the vdisk active, you may turn off the disk:
onappstore offline uuid=(zombie disk identifier here)
4. Remove the zombie disk with the command.
onappstore delete uuid=(zombie disk identifier here)