We moved this page to our Documentation Portal. You can find the latest updates here. |
Issue
Cloudboot hypervisors kernel panics, crashing storage controllers, other unexpected behavior.
Description
During the upgrade of Integrated Storage packages, there might be a chance that a new kernel and drivers will not go along well with existing hardware.
For these types of situations we have already created a documentation article which helps to customize the drivers https://docs.onapp.com/display/IS/Customizable+CloudBoot+Images
But what if the state of the Integrated Storage packages should be reverted partly or fully?
For example, some hypervisors in some specific zone are not working correctly with the updated kernel/drivers/software and you need to revert the aforementioned packages back to the working state.
Environment
Virtually any environment, but tested cases were starting from OnApp 4.1.
Implementation
In order to maintain the working state of other hypervisors that were upgraded to the latest kernel and software and don't have any problems, there might be two major ways to handle this issue:
- Through complete backup of /tftpboot directory prior to the upgrade in order to restore the exact pxe boot records and boot images of the problematic hypervisors.
- Or through downloading the necessary cloudboot package from repositories http://cdn.rpm.repo.onapp.com/repo/centos/$releasever/$basearch/RPMS-x.x/ and retrieving the necessary boot images.
These two ways involve the same pxeboot.cfg boot record change.
Resolution
Predicative resolution will start from tftpboot directory backup before initiating the upgrade.
Reactive resolution will start from going to http://rpm.repo.onapp.com/repo/ and finding the necessary package.
The next steps will be described for the latter.
In this example, we will download this store package https://docs.onapp.com/display/RN/OnApp+4.1.0-14+Storage+Update in the cloud which has been updated to 5.3.
[root@cp ~]# mkdir /tmp/old_storage_image
[root@cp ~]# cd /tmp/old_storage_image/
[root@cp old_storage_image]# ls
[root@cp old_storage_image]# wget http://rpm.repo.onapp.com/repo/centos/6/x86_64/RPMS-4.1/onapp-store-install-4.1.0-14.noarch.rpm
--2017-04-26 10:28:17-- http://rpm.repo.onapp.com/repo/centos/6/x86_64/RPMS-4.1/onapp-store-install-4.1.0-14.noarch.rpm
Resolving rpm.repo.onapp.com (rpm.repo.onapp.com)... 148.251.136.68
Connecting to rpm.repo.onapp.com (rpm.repo.onapp.com)|148.251.136.68|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1110480020 (1.0G) [application/x-redhat-package-manager]
Saving to: ‘onapp-store-install-4.1.0-14.noarch.rpm’
100%[==================================================================================================================================================================================================>] 1,110,480,020 2.57MB/s in 7m 20s
2017-04-26 10:35:37 (2.41 MB/s) - ‘onapp-store-install-4.1.0-14.noarch.rpm’ saved [1110480020/1110480020]
[root@cp old_storage_image]# ls
onapp-store-install-4.1.0-14.noarch.rpm
[root@cp old_storage_image]# rpm2cpio onapp-store-install-4.1.0-14.noarch.rpm |cpio -idmv
./onapp/onapp-store-install
./onapp/onapp-store-install/files
...
./usr/pythoncontroller/diskutil.pyc
./usr/pythoncontroller/python
3090501 blocks
[root@cp old_storage_image]#
Then, we need to locate the necessary images that we will use to boot our problematic hypervisors with.
In our example, we had problems with the crashing hypervisors on the latest storage package.
[root@cp old_storage_image]# ls tftpboot/images/centos6/ramdisk-kvm/
initrd.img liveupdate-storagenode.tgz liveupdate.tgz vmlinuz vmlinuz-2.6.32-358.6.2.el6.x86_64 vmlinuz-2.6.32-431.29.2.el6.x86_64
[root@cp old_storage_image]#
Next step will be editing the pxeboot configuration for the specific problematic hypervisor.
We need to locate the management MAC address in the hypervisor's edit page (https://docs.onapp.com/display/53AG/Edit+CloudBoot+Compute+Resource), then go to the specific configuration file:
[root@cp old_storage_image]# cat /tftpboot/pxelinux.cfg/01-00-0c-29-09-9a-33
default centos6-ramdisk-kvm
label centos6-ramdisk-kvm
kernel images/centos6/ramdisk-kvm/vmlinuz
append initrd=images/centos6/ramdisk-kvm/initrd.img NFSNODEID=00-0c-29-09-9a-33 NFSROOT=10.76.0.100:/tftpboot/export/centos6/kvm CFGROOT=10.76.0.100:/tftpboot/images/centos5/diskless/snapshot ADDTOBRIDGE=MGT pcie_aspm=off selinux=0 cgroup_disable=memory
[root@cp old_storage_image]#
and change the path to the kernel and initrd images:
>kernel images/centos6/ramdisk-kvm/vmlinuz
>initrd=images/centos6/ramdisk-kvm/initrd.img
[root@cp ramdisk-kvm]# mkdir /tftpbootold/
[root@cp old_storage_image]# cp -r tftpboot/images/ /tftpbootold/
[root@cp old_storage_image]# readlink -f /tftpbootold/images/centos6/ramdisk-kvm/vmlinuz
/tftpbootold/images/centos6/ramdisk-kvm/vmlinuz-2.6.32-431.29.2.el6.x86_64
[root@cp old_storage_image]# readlink -f /tftpbootold/images/centos6/ramdisk-kvm/initrd.img
/tftpbootold/images/centos6/ramdisk-kvm/initrd.img
[root@cp old_storage_image]# ln -s /tftpbootold/images/centos6/ /tftpboot/images/centos6old
This step is necessary in order to save the old images on the next upgrade:
[root@cp ~]# mkdir -p /tftpboot/images/centos6old/
[root@cp ~]# mount --rbind /tftpbootold/images/centos6/ /tftpboot/images/centos6old/
[root@cp ~]# ls -la /tftpboot/images/centos6old/
total 16
drwxr-xr-x 4 root root 4096 Apr 26 14:28 .
drwxr-xr-x 8 root root 4096 Apr 26 14:19 ..
drwxr-xr-x 2 root root 4096 Apr 26 14:28 ramdisk-kvm
drwxr-xr-x 2 root root 4096 Apr 26 14:28 ramdisk-xen
[root@cp ~]#
New path should be like in the output below:
[root@cp old_storage_image]# cat /tftpboot/pxelinux.cfg/01-00-0c-29-09-9a-33
default centos6-ramdisk-kvm
label centos6-ramdisk-kvm
kernel images/centos6old/ramdisk-kvm/vmlinuz
append initrd=images/centos6old/ramdisk-kvm/initrd.img NFSNODEID=00-0c-29-09-9a-33 NFSROOT=10.76.0.100:/tftpboot/export/centos6/kvm CFGROOT=10.76.0.100:/tftpboot/images/centos5/diskless/snapshot ADDTOBRIDGE=MGT pcie_aspm=off selinux=0 cgroup_disable=memory
[root@cp old_storage_image]#
Then, we need to find the necessary snapshot configuration by the same MAC address and create a special earlyboot script which allows to access the hypervisor over ssh without rebuilding the image:
[root@cp old_storage_image]# cd /tftpboot/images/centos5/diskless/snapshot/00-0c-29-09-9a-33/
[root@cp 00-0c-29-09-9a-33]# mkdir -p overlay
[root@cp 00-0c-29-09-9a-33]# cd overlay/
[root@cp overlay]# touch earlyboot.sh
[root@cp overlay]# chmod a+x earlyboot.sh
PLEASE NOTE: this script contains sensitive information, such as public ssh keys, so be careful when you post its contents:
[root@cp overlay]# cat earlyboot.sh
#!/bin/sh
PATH=/sbin:/usr/sbin:/bin:/usr/bin
export PATH
rm -rf /root/.ssh/
mkdir -p /root/.ssh/
echo 'ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEArD3bolzIn6dp84NAeZaTm4lOTLkjOvhPsRt1k/YbGM8mJU+7GOHvie5EMnkUBCac+X8AzBd+ttfYsZqYfEqRhF8dnf5e1oQBVCnCUIhKAJUO33JPmbOIUfnnU47zKqGjdMjKvXBC6n2ChGVigsYMsIDciHdobgvp2SMghhS4W3qn7vpWv/kCxg4Oi2CZcoFzXgLzIi3+ZFWb7/Y9Zo+WEUjNn1vnmpJlN/A9AupAILUz+a5obq3HGkWcyx6VWWf5eGRKFSaYFWBRcV9Ux3TPHATc/yuBQnaxHkoRaxcWcs0J43H3iBqpmWoMeN2YWMt16DB/NoPlvBMjz9VeSeIW9Q== root@cp' >> /root/.ssh/authorized_keys
echo "ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAsj0CaC7o99oC75Hz8fHl7yjTMpZ5F4Q4604bmnXMZsCl9rYHeB8TzChojXqiisokGmuGdzrtzgzHEuMkrD0Jiq1ea4vZ3ReF3E+Wlh9+oAlQl2uPUUhKkMlKeD5D8grk11jCSresmxvhn+drm0lKKwhCXKWdJ0mopZWRwS+T1PDExHmt7rx96J0ErJHpfPTWIvSVgMXDMSskaGUrptydWbePzvjkjpwU5RBbuVIV3Ei9tCoSotinPJKoBxz+sZKi4SFjXwmHiQ123zpVfcIwBJCO7gT/YFoS6FRm7PAylQ1Eud2zJz57g4qMrD58jpDGR10EKHoofC70jGCOMgmVuQ==" >> /root/.ssh/authorized_keys
chmod -R 600 /root/.ssh/
chmod 600 /root/.ssh/authorized_keys
sed -i '/authCommunity/a authCommunity execute onapp 10.76.0.100\n' /etc/snmp/snmptrapd.conf
sed -i '/com2sec/a com2sec cust_sec 10.76.0.100 onapp\n' /etc/snmp/snmpd.conf
/etc/init.d/snmpd stop
/etc/init.d/snmptrapd stop
/etc/init.d/snmpd start
/etc/init.d/snmptrapd start
/sbin/iptables -I INPUT -s 10.76.0.100/32 -d 10.76.0.21/32 -p tcp -m tcp --dport 8080 -j ACCEPT
sed -i 's/HOST.*/HOST="10.76.0.100"/' /etc/onapp.conf
/etc/init.d/storageAPI stop
/etc/init.d/storageAPI start
[root@cp overlay]#
To make pxeboot configuration permanent, a renewal script should be created:
[root@cp ~]# touch /home/onapp/pxeboot-renew
with the next content:
#!/bin/bash
runn () {
#renewal line which will substitute the necessary lines
sed -i 's|images\/centos6\/ramdisk-kvm\/vmlinuz|images\/centos6old\/ramdisk-kvm\/vmlinuz|g;s|images\/centos6\/ramdisk-kvm\/initrd.img|images\/centos6old\/ramdisk-kvm\/initrd.img|g' "$@"
}
for config in "$@"; do
fullpath=$(find /tftpboot/pxelinux.cfg/ -name "*$config")
runn $fullpath
done
Then, place the script into the crontab for every minute execution:
[root@cp ~]# grep pxeboot-renew /etc/crontab
* * * * * root sh /home/onapp/pxeboot-renew 00-0c-29-09-9a-33
[root@cp ~]#
Note: the line in crontab should be removed in case you want to boot the image from the currently installed onapp store packages.
After all the preparations, problematic hypervisor should be rebooted in order to boot up from the older/specific image.
When the hypervisor is booted, check whether the kernel version has changed:
[root@10.76.0.21 ~]# uname -a
Linux hv000c29099a33 2.6.32-431.29.2.el6.x86_64 #1 SMP Tue Sep 9 21:36:05 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
[root@10.76.0.21 ~]#
The above steps can be repeated as many times as needed to find the proper kernel/drivers/software combination.