Thursday, 27 February 2014

NICs Disconnection In XS 6.2


I am going to touch a specific issue that i observed for XenServer 6.2. The pool has bonded management interface. If a new server is tried to introduce, the management interface for the new server goes down and doesn't come up.

when looking into the networking tab, you would see one interface with unknown tag line and another with management interface without any IP address.

Resolution :-
Say NIC 0 and NIC 1 has been bonded
Try the specific command sets in order :-

ifconfig eth0 down
ifconfig eth1 down

ifconfig eth0 up
ifconfig eth 1 up

and voila the XenServer comes up with out complaining any further.

Thursday, 26 September 2013

Xen Host Crashes

Xenserver host crashes/Unresponsiveness are very rare and difficult to find the root cause. However below pointers will definitely help locating and in some case resolving the issue.

XenServer Unresponsiveness - Host goes unresponsive in lean hours and doesn't come back online. All the other services are not reachable. Please check for the c-states in your processor settings. C-states are power saving feature which rests the internal clock for the processor in idle state. However the problem with the states is that XenServer goes in such deep sleep that it it's internal clock never resumes. Even Turbo boost for the processor has been seen sometimes a culprit for the unresponsiveness. These features can be disabled from BIOS.

XenServer Crash - There may be many causes for the XenServer crash. However some highlighted causes for the host crash is OOMKILL, kenel segfault etc

OOMKILL- A short form for out of memory kill. This can happen when some services/module starts consuming all the memory available to the control domain of the Xen host. It can ascertained from the kern logs if the host is having OOMKILL for it's non-working. A definite solution is not possible as the solution changes with scenario, however if possible the host can be taken out of the pool and rebuild which saves production time.

Kernel segfault - The happens when one or other module gets faulted in the kernel. The kernel get segmentation fault raises signal 11(SIGSEGV) which is defined in the header file signal.h and registers the same in the kern logs. Something like this is presented in the kern logs:-

Again the solution to this issue would vary but it would definitely give some idea as to what  went wrong when the host crashed. However it is always recommended to patch the firmwares. Also host should be patched with all citrix hotfixes.

Kernel Panic - Sometimes after the crash, the XenServer will not come up and will show kernel panic on the console. Kernel Panic is an action by Xen hypervisor stating it received fatal error while booting itself and it is not able to recover from it. It can happen because of hardware failure as well as file system corruption.

Monday, 16 September 2013

XenServer Networking

There remains much confusion about what is management interface; what it does and how should we go about configuring networking for the enterprise. Below is a brief discussion about it :-
We can break XenServer networks into three brands :-
Management Interface
VM Networks
Storage Networks

Please note that each one of them is entirely different in their working and hence should be differently configured.
Management interface is primarily for your XenServer hosts. It is through this interface that all the management activities of the hosts progress such as :-
+ copying, moving, exporting your VM
+ HA heartbeats
+ Pool consistency check
+ VM management activities such as migration, startup sequences etc..
and many others

This interface being the most important for the working of the hosts should be bonded to have failover if one of the NICs go down. With the latest version of XS i.e 6.2 we can now have four NICs in a bond. Please provide a static IP as a best network policy although we have an option of providing DHCP in management. VLAN tagged network is not supported here

VM Network :- This is the interface where all the network traffic for each individual VM progresses. It should be highlighted here that there were issues reported for offload engines with 2003 VM's (although it is EOL, but VM's already running over 2003 are not going to be migrated to any other Server version all of a sudded ) and therefore they are recommended to be disabled for better performance. Consequently Offload engines are therefore suggested to be disabled at pif level. The NICs can be bonded for network failure and tagged network is allowed here. DHCP can be recommended in some scenarios such as XenDesktop provisioned VM's etc ; it is entirely scenrio based.

Storage Network :- The Storage interface is the one on which storage is going to provision LUN's to the hosts. The same interface is used to have Multipaths enabled. Multipath enabled networks must not be bonded (as we defeat the purpose of multipath by bonding the NICs and also considering the performance is reduced). The storage NICs must be on different subnets and so should be the storage controllers.


Saturday, 7 September 2013

XenServer Multipath Issues

I would like to elaborate some simple techniques for one Multipath issue on ISCSI storage very often bogging the enterprise.

Some paths not available to newly added hosts :- This is very common issue. A host is added in the pool and instead of locating all the paths to a storage, it has started to see only a few path.

*) A first point of troubleshooting is to check if you are able to ping all the ip's of the storage.
*)Next check for the multipath status.
multipath -ll
You could go for a forward troubleshooting technique :-
manually discover each of the node :- 
service open-iscsi restart
# iscsiadm –m discovery --sendtargets –portal <ip_addr_of_storage>
Note:- The storage IP for each of the controller should be in different network. This could be a probable reason why XenServer fails to connect to the controller as the network was not configured to connect to the network. Always check for the network settings for storage in XenServer network tab.

Then log on to all the nodes of the storage at one shot by :-
#iscsiadm -m node -L all

Or log on to each of the node separately by :-
# iscsiadm –m node –T –p <ip-address-of-the-storage> -l

and then 

Give the server a reboot, if this doesn't resolve the issue.

If the above technique fails you could probably go for a reverse engineering technique :-

logoff all the targets:-
iscsiadm -m node -u all

shut the iscsi service down by either of the command :- 
chkconfig open-iscsi stop
service open-iscsi stop

Change the multipath status of the host to inactive and then reboot.

Re-initialize the multipath configuration on the host.

I hope this article will definitely come handy when facing issues like this one.

Tuesday, 3 September 2013

LVM Over ISCSI On XenServer

XenServer has been using LVM as the default local storage management. Moreover there have been many popular SAN solutions which are using LVM based Volume Group. It's then very important to understand the working of LVM. I would restrict myself in the context of Xenserver otherwise that would be a very lengthy paper without any immediate advantage.

LVM :- A very simple concept of LVM is to take a group of physical disks, put it in a logical container to make a Volume Group and then take a chunk out of it in the form of logical volumes. Advantages of this approach being easy management and maintenance of disks.
XenServer Local storage is primarily /dev/sda3 until and unless there has not been alternative partitions done.
This sda3 is the group of physical disks that are being provided in the server itself. The Volume group in this context would be /dev/VGXenStorage_XXX where XXX is the unique identifier. The logical volumes are the VHD's or Virtual Hard Disks/LV or logical Volumes.VHD's are shown in Storage when it is LVM based and LV when it is file based such as nfs etc. However just for the purpose of abstraction, the XenServer identifies every Disk as Virtual Desktop Image (VDI) and not as VHD/LV. This same can be verified as andwhen we scan the list of Virtual Machines, XenServer only identifies the list of VDI's associated with the machines and never the VHD's/LV. We have to manually link the VHD's with the VDI's using the following command :-
xe vm-disk-list vm-uuid
xe vdi-list vm-name-label

Very similar is the concept of LVM storage from the shared storage. All the stages of LVM can be verified from the following commands :-
pvscan or pvs - which shows the associated physical disk/s for the Volume Group.
vgscan or vgs - which shows the Volume Group for the associated Logical Volumes.
lvscan or  lvs - which shows the Virtual Hard disks/Logical Volumes for the Virtual Machines.

ISCSI -We all know that ISCSI is used prominently for the connection of storage to the XenServer. It's an IP based protocol for linking Data Storage Systems. Some popular components of this protocol are :-

Initiator :- The IQN is the unique identifier number generated for each of the host through which storage systems recognises the hosts connected through iscsi. In XenServer the IQN is generated at the time of installation itself and can be viewed in the general tab of the host. Also it can be altered anytime( which is not recommended at all)

Target :- The target is the Storage systems to which the XenServer is required to connect through.

LUN :- Logical Unit Number is the Volume Group generated after the iscsi is able to establish between the Storage and the XenServer. Therefore from XenServer view, the volume group from the shared storage is termed  LUN.

The above two are interconnected in XenServer, so that the host could utilize the benefits of the storage management over Internet and connect to a shared storage.

Monday, 26 August 2013

XenServer error :- This operation can't be performed because the specified virtual disk could not be found

 The error specified below can be encountered many a times. I came across this one while performing XenServer upgrade.
My Observations :- The VM was having it's disk intact as shown in the VM tab. The Storage was showing VM connected to it's Vdisk. On the LVM level, the VHD was active. Even the VHD scan was coming all true for the storage. A rescan of shared storage was successful.

If all the above is true in your case, try ejecting the iso file from your CD/DVD drive of VM and fire it up. It won't complain  about the virtual disk and would come up like a charm.

Post me if there is any other query about this error.