Author: K.M. Ali Qamar

How To Monitor VMware envirnment with Grafana

This step-by-step guide uses the Official telegraph vSphere plugin to pull metrics from vCenter. We will pull metrics such as compute, network and storage resources. Before starting with this guide, I assume you have a freshly installed operating system, ubuntu 20. so let’s being with our work.

Step: 1 Install Grafana on Ubuntu

This tutorial tested on freshly installed OS Ubuntu 20.04.

  • Start your Grafana installation.

wget https://dl.grafana.com/oss/release/grafana_7.1.3_amd64.deb

sudo dpkg -i grafana_7.1.3_amd64.deb

  • Now start and enable your Grafana service.

sudo systemctl start grafana-server.service

sudo systemctl enable grafana-server.service

  • Check Grafana service status.

sudo systemctl status grafana-server.service

  • At this point, Grafana is installed, and you can log in to your Grafana by following

url: http://[your Grafana server ip]:3000

The default username/password is admin/admin

  • Upon the first login, Grafana will ask you to change the password.
  • Be careful HTTP is not a secure protocol. You can further secure it by putting SSL certificates.

Step: 3 Install Influx DB

  • Inquire about the available InfluxDB version in your apt-cache by the following command.

sudo apt-cache policy influxdb

It will be the last stable version of InfluxDB. We will use a later version 1.8 of InfluxDB, so we will update the apt cache first and add the required information to the repository.

wget -qO- https://repos.influxdata.com/influxdb.key | sudo apt-key add -

source /etc/lsb-release

echo "deb https://repos.influxdata.com/${DISTRIB_ID,,} ${DISTRIB_CODENAME} stable" | sudo tee /etc/apt/sources.list.d/influxdb.list

sudo apt update

sudo apt-cache policy influxdb

sudo apt update

sudo apt-cache policy influxdb

sudo apt install influxdb -y

  • Check the status and ensure that it sustains over the reboot.

sudo systemctl start influxdb

sudo systemctl status influxdb

sudo systemctl enable influxdb

  • The InfluxDB will listen on port 8086, and if your server is on the internet, then depending on any existing firewall rules, anybody may be able to query the server using the URL

https://[your domain name or ip]:8086/metrics

  • On my local machine where I am doing this test, is not having any firewall enabled, but if you have allowed or using public IPs, you can prevent direct access by doing these commands

iptables -A INPUT -p tcp -s localhost --dport 8086 -j ACCEPT

iptables -A INPUT -p tcp --dport 8086 -j DROP

Step: 4 Install Telegraf

  • Now we are going to install telegraf.

sudo apt install telegraf -y

  • Start Telegraf and ensure it starts in case of reboot.

sudo systemctl start telegraf

sudo systemctl status telegraf

sudo systemctl enable telegraf

  • Configure Telegraf to pull Monitoring metrics from vCenter, so here we will configure Telegraf main configuration file:
  • In this /etc/telegraf/telegraf first, you need to add information for influxdb.
  • change your influxdb credentials.

————————————————————————————————————————————–

[[outputs.influxdb]]
urls = ["http://<Address_of_influxdb_server>:8086"]
database = "vmware"
timeout = "0s"

#only with if you are using authentication for DB

#username = "USERNAME_OF_DB"

#password = "PASSWD_OF_DB"

————————————————————————————————————————————-

# Read metrics from VMware vCenter
[[inputs.vsphere]]
## List of vCenter URLs to be monitored. These three lines must be uncommented
## and edited for the plugin to work.
vcenters = [ "https://<vCenter_IP>/sdk" ]
username = "administrator@vsphere.local"
password = "PASSWD"
#
## VMs
## Typical VM metrics (if omitted or empty, all metrics are collected)
vm_metric_include = [
"cpu.demand.average",
"cpu.idle.summation",
"cpu.latency.average",
"cpu.readiness.average",
"cpu.ready.summation",
"cpu.run.summation",
"cpu.usagemhz.average",
"cpu.used.summation",
"cpu.wait.summation",
"mem.active.average",
"mem.granted.average",
"mem.latency.average",
"mem.swapin.average",
"mem.swapinRate.average",
"mem.swapout.average",
"mem.swapoutRate.average",
"mem.usage.average",
"mem.vmmemctl.average",
"net.bytesRx.average",
"net.bytesTx.average",
"net.droppedRx.summation",
"net.droppedTx.summation",
"net.usage.average",
"power.power.average",
"virtualDisk.numberReadAveraged.average",
"virtualDisk.numberWriteAveraged.average",
"virtualDisk.read.average",
"virtualDisk.readOIO.latest",
"virtualDisk.throughput.usage.average",
"virtualDisk.totalReadLatency.average",
"virtualDisk.totalWriteLatency.average",
"virtualDisk.write.average",
"virtualDisk.writeOIO.latest",
"sys.uptime.latest",
]
# vm_metric_exclude = [] ## Nothing is excluded by default
# vm_instances = true ## true by default
#
## Hosts
## Typical host metrics (if omitted or empty, all metrics are collected)
host_metric_include = [
"cpu.coreUtilization.average",
"cpu.costop.summation",
"cpu.demand.average",
"cpu.idle.summation",
"cpu.latency.average",
"cpu.readiness.average",
"cpu.ready.summation",
"cpu.swapwait.summation",
"cpu.usage.average",
"cpu.usagemhz.average",
"cpu.used.summation",
"cpu.utilization.average",
"cpu.wait.summation",
"disk.deviceReadLatency.average",
"disk.deviceWriteLatency.average",
"disk.kernelReadLatency.average",
"disk.kernelWriteLatency.average",
"disk.numberReadAveraged.average",
"disk.numberWriteAveraged.average",
"disk.read.average",
"disk.totalReadLatency.average",
"disk.totalWriteLatency.average",
"disk.write.average",
"mem.active.average",
"mem.latency.average",
"mem.state.latest",
"mem.swapin.average",
"mem.swapinRate.average",
"mem.swapout.average",
"mem.swapoutRate.average",
"mem.totalCapacity.average",
"mem.usage.average",
"mem.vmmemctl.average",
"net.bytesRx.average",
"net.bytesTx.average",
"net.droppedRx.summation",
"net.droppedTx.summation",
"net.errorsRx.summation",
"net.errorsTx.summation",
"net.usage.average",
"power.power.average",
"storageAdapter.numberReadAveraged.average",
"storageAdapter.numberWriteAveraged.average",
"storageAdapter.read.average",
"storageAdapter.write.average",
"sys.uptime.latest",
]
# host_metric_exclude = [] ## Nothing excluded by default
# host_instances = true ## true by default
#
## Clusters
cluster_metric_include = [] ## if omitted or empty, all metrics are collected
# cluster_metric_exclude = [] ## Nothing excluded by default
# cluster_instances = false ## false by default
#
## Datastores
datastore_metric_include = [] ## if omitted or empty, all metrics are collected
# datastore_metric_exclude = [] ## Nothing excluded by default
# datastore_instances = false ## false by default for Datastores only
#
## Datacenters
datacenter_metric_include = [] ## if omitted or empty, all metrics are collected
# datacenter_metric_exclude = [ "*" ] ## Datacenters are not collected by default.
# datacenter_instances = false ## false by default for Datastores only
#
## Plugin Settings
## separator character to use for measurement and field names (default: "_")
# separator = "_"
#
## number of objects to retreive per query for realtime resources (vms and hosts)
## set to 64 for vCenter 5.5 and 6.0 (default: 256)
# max_query_objects = 256
#
## number of metrics to retreive per query for non-realtime resources (clusters and datastores)
## set to 64 for vCenter 5.5 and 6.0 (default: 256)
# max_query_metrics = 256
#
## number of go routines to use for collection and discovery of objects and metrics
# collect_concurrency = 1
# discover_concurrency = 1
#
## whether or not to force discovery of new objects on initial gather call before collecting metrics
## when true for large environments, this may cause errors for time elapsed while collecting metrics
## when false (default), the first collection cycle may result in no or limited metrics while objects are discovered
# force_discover_on_init = false
#
## the interval before (re)discovering objects subject to metrics collection (default: 300s)
# object_discovery_interval = "300s"
#
## timeout applies to any of the api request made to vcenter
# timeout = "60s"
#
## Optional SSL Config
# ssl_ca = "/path/to/cafile"
# ssl_cert = "/path/to/certfile"
# ssl_key = "/path/to/keyfile"
## Use SSL but skip chain & host verification
insecure_skip_verify = true

—————————————————————————————————————

  • You only need to change the credential of vcenter and influxdb
  • Start and enable telegraf service after making the changes.
  • sudo systemctl restart telegraf
  • sudo systemctl enable telegraf

Step: 4.1 Check InfluxDB Metrics

  • We need to confirm that our metrics are being pushed to InfluxDB and that we can see them.
  • If you are using authentication then open  InfluxDB shell like this:

$ influx -username 'username' -password 'PASSWD'

  • We need to confirm that our metrics pushed to InfluxDB and that we can see them.
    If you are using authentication, then open the InfluxDB shell by this:

$ influx

  • Then:

> USE vmware

  • Using database vmware:
  • Check if there is an inflow of time series metrics.

> SHOW MEASUREMENTS

name: measurements

name

—-

cpu

disk

diskio

kernel

mem

processes

swap

system

vsphere_cluster_clusterServices

vsphere_cluster_mem

vsphere_cluster_vmop

vsphere_datacenter_vmop

vsphere_datastore_datastore

vsphere_datastore_disk

vsphere_host_cpu

vsphere_host_disk

vsphere_host_mem

vsphere_host_net

vsphere_host_power

vsphere_host_storageAdapter

vsphere_host_sys

vsphere_vm_cpu

vsphere_vm_mem

vsphere_vm_net

vsphere_vm_power

vsphere_vm_sys

vsphere_vm_virtualDisk

Step 5: Add InfluxDB Data Source to Grafana

  • Login to Grafana and add InfluxDB data source
  • Click on the configuration icon and then click datasource.
  • Click Add influxDB data source.
  • Insert all the relevant information under HTTP and influxDB details shown into the red boxes below:
  • If you used a password in your influxDB you might put it here.

Grafana

Step 6: Import Grafana Dashboards

  • The last action is to create or import Grafana dashboards:
  • Building a Grafana dashboard is a lengthy process, so we are using a community dashboard built by Jorge de la Cruz.

Grafana

  • We will import this pre-build Grafana dashboard #8159. The moment you did import, you will see your Grafana dashboard.

Grafana

How Anisble Manage Configuration Files

This article will discuss, where the Ansible configuration files are located and how Ansible selects them and how we can edit default settings.

Configuring Ansible:

The Ansible behavior can be customized by modifying settings in the Ansible configuration files. Ansible chooses its configuration file from one of many locations on the control node.

  •  /etc/ansible/ansible.cfg
    This file contains the base configuration of the Ansible. It is used if no other configuration file is found.
  • ~/.ansible.cfg
    This ~/.ansible.cfg configuration is used instead of the /etc/ansible/ansible.cfg because Ansible for .ansible.cfg in the home directory of the user.
  • ./ansible.cfg
    If the Ansible command is executed in the directory where the ansible.cfg is also present ./ansible.cfg will be used.

Recommendations of Ansible configuration files:

Ansible recommends creating a file in the directory from where you run the ansible command.

Varibale ANSIBLE_CONFIG

To define the location of the configuration file Ansible gives you a more handy option to define the configuration file by allowing you to change the environment variable named ASNIBLE_CONFIG. If you define this ANSIBLE_CONFIG variable, Ansible uses the configuration file that the variable specifies instead of any of the previously mentioned configuration file.

Configuration File Precedence:

Ansible Configuration File Precedence Table
First preference Environment variable ANSIBLE_CONFIG overrides all other configuration files. If this variable is not settled, then second preference will be taken
Second preferenceThe directory in which the ansible command was run is then checked for configuration file ‘ansible.cfg’. If this file is not present, then Ansible goes to third preference.
Third PreferenceThe user’s home directory is checked for a .ansible.cfg file.
fourth preferenceThe global /etc/ansible/ansible.cfg file is only used if no other configuration file is found.

 

Due to Ansible’s capability to handle configuration from multiple locations, sometimes it makes the user confused to determine the active configuration file.

So how use can determine which file is active?

How to check which Ansible configuration file is being used?

You can run the ansible –version command to identify which version of Ansible is installed and which configuration file is used.

[ali@controller /]$ ansible --version
ansible 2.9.16
config file = /etc/ansible/ansible.cfg
configured module search path = ['/home/ali/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python3.6/site-packages/ansible
executable location = /usr/bin/ansible
python version = 3.6.8 (default, Aug 24 2020, 17:57:11) [GCC 8.3.1 20191121 (Red Hat 8.3.1-5)]
[ali@controller /]$

If you need servers to practice Ansible or Linux?

SeiMaxim is a leading Dutch web hosting company and provides resources to learn Ansible and Linux. If you want to get virtual servers to learn Ansible you can place your order and use code SE-ANSIBLE211 to rent two servers in just 18 USD.

All You Need To Know About iSCSI on VMware

iSCSIBasics of iSCSI

In the computing world, iSCSI is an acronym for Internet Small Computer Systems Interface, an (IP Internet Protocol)-based storage networking standard for linking data storage facilities. It provides block-level access to storage devices by carrying SCSI commands over a TCP/IP network.

With the emergence of high-speed networks which includes 2.5 Gbps, 5 Gbps, 25 Gbps, 40 Gbps, 50 Gbps and 100 Gbps speeds iSCSI becoming more popular where Fiber channel still dominates in production environments but for the non-critical environment and the customers looking for a cheap storage solution iSCSI is the best option for them. In recent years vSphere has tons of improvement in the iSCSI software initiator especially with the jumbo frames support iSCSI is spreading widely in the industry.

Let’s starts with the basics, it is one of the main IP storage standards especially for the none critical load. In the figure below a server is on network is accessing block storage.iSCSIType-1 hypervisors are capable to support different storage technologies and protocols for presenting external storage devices. We will discuss here mostly about VMware and a little bit about KVM. So vSphere is providing support for iSCSI storage since it’s the greatest version of all times “virtual infrastructure 3”.

Adoption of iSCSI

We do not need to create the new network as we do in FC, we can have SCSI in our common network for LAN, MAN & WAN.  TCP/IP have no limits on distance. Manpower and TCP/IP opensource tools are widely available, so these are the main benefits we have over FC if we implement iSCSI

Nowadays you do not need a storage admin as companies were in need 10 years ago. Storage arrays evaluated a lot in recent years and it is now so easy to configure them. Management software will do all the raids and hardware monitoring for you. One of the main benefits of ISCSI implementation is it is inexpensive as compared to its other counterpart storage protocols such as fibre channel protocol.

“The Bitterness of poor performance lasts long after the sweetness of a cheap price is forgotten” Michael Webster VMworld 2013

When iSCSI use network interface cards rather than dedicated iSCSI adapters, interfaces expected to use significant amount of CPU resource of your servers. There are many ways to overcome this problem but one of them is to use TOE ( TCP Offload Engine) capable NIC.

What TOE do?

It just simply move the TCP packet processing tasks from server CPU to specialized TCP processor on the network adopter or it could be possible that move to the Storage device. The concept of offloading work from the main processor is similar to that governing graphics coprocessors, which offload 3D calculations and visual rendering tasks from the main CPU.

The ability of TOE to perform full transport layer functionality is essential to obtaining tangible benefits. The important aspect of this layer is it being the process-to-process layer.

In my point of view the cost is unquestionably the main issue that has hindered the adoption of TOE in general enterprise community. The normal TOE capable cards can range in price from $400 to $2000 and in some of the server you need to use the expansion slot or even raiser so additional cost and the benefit you get is not that big that everybody consider to buy TOE cards.

Moreover, in my point of view over the time VMware has improved its vSphere a lot specially with the liberty to enable Jumbo Frames I would prefer to use the software iSCSI and on other side with TOE you don’t have it. For VMware please go to VMware HCL

Difference between iSCSI and Fiber Channel

One of the main difference between iSCSI and Fibre channel is the ways to handle the I/O congestion. So when an iSCSI path is overloaded or it drops packet and become substantially oversubscribed, this bad situation quickly grows and become worse. The performance further degrades because dropped packet must be resent, where as FC protocol is having a built in pause mechanism when congestion occurs. So both protocols are having different mechanism to handle congestion.

Currently many vendors implemented delayed Ack and congestion avoidance as a part to there TCP/IP stack. VMware recommends consulting the iSCSI array vendor for specific recommendations around Delayed Ack.

TCP delayed acknowledgment

TCP delayed acknowledgment is a technique used to improve network performance. In essence, several ACK responses may be combined together into a single response, reducing protocol overhead.

Difference between iSCSI and NAS

NAS presents devices at the file level. NAS is specialized for serving files either by its hardware, software or configuration. Its often manufactured as a computer appliance.

iSCSI Architecture

iSCSI is an Internet Engineering Task Force (IETF) standard for encapsulating SCSI control and data in TCP/IP packets. In shown below figure you can see how iSCSI is encapsulated in TCP/IP and Ethernet frames.

iSCSI

VMware iSCSI Names

Named globally unique and they are not bound to any ethernet adopters or IP addresses. iSCSI support two forms, one is Extended Unique Identifier (EUI) and iSCSI Qualified names IQN
Basically, iSCSI is the client-server architecture. The clients of an iSCSI interface are known as initiators and the server that shares the storage area is known as targets.

iSCSI Components

There are two basic iSCSI components;

Initiator

It functions as an iSCSI client. An iSCSI initiator sends an SCSI command over an IP network. There are two kinds of initiators;

A software initiator uses code to implement iSCSI, typically, this happens in the kernel device driver that uses the network card and network stack to emulate SCSI devices for a computer by speaking the iSCSI protocol.
Nowadays all most all the popular operating system comes with the software initiators. In the table below you can find the dates when operating systems released their software intiators.

Operating SystemFirst day releaseVersionFeatures
VMware ESX2006ESX 3.0-7.XTarget,Multipath
Linux20052.6.12,3.1Initiator,Target, Multipath,VAAI
Windows200032000,WIN19Initiator,Target,Multipath
FreeBSD20097.0initiator

Target

The iSCSI refers to a storage resource located on an iSCSI server as a target. These are typically data providers. This is your storage array and it provides distinct iSCSI targets for numerous clients.
In the context of vSphere, iSCSI initiators fall into three distinct categories.

Software iSCSI Adaptor

This is VMware code built into the vmkernel, it enables your host to connect to the iSCSI storage device through the standard network adapter.

Depended hardware iSCSI Adapter

Provided by VMware this type of adapter can be a card that presents a standard network adapter and iSCSI offload functionality for the same port. An example of a dependent adapter is the iSCSI licensed Broadcom 5709.

Independent hardware iSCSI Adapter

This kind of adapter represents an independent hardware iSCSI adapter which is a card that presents either iSCSI offload functionality and standard nic functionality. The iSCSI offload functionality has independent configuration management that assigns the IP address, MAC address, and other parameters used for the iSCSI sessions. This is the kind we talked about earlier TOE cards. To identify if TOE and other TCP features are enabled or not, run the command:

To check the status of TSO in ESX/ESXi, run the command:

ethtool -k vmnicX

To disable TSO within a Linux guest OS, run the command:

ethtool -K ethX tso off

Simplest Topology of An iSCSI Array

In the figure below four ESX hosts are connected in the simplest form, so each ESX is having two uplinks and they are connected with two switches and on other side storage array is connected to the switch. All the connections are redundant.

iSCSI

Try to avoid vSphere NIC teaming and use port binding. with port binding you can utilise the multipathing for availability of access to the iSCSI targets.

There will be some senarios where you need to use the teaming. If this is the case then turn off port security on the switch for the two ports on which the virtual IP address is shared. By turning off the security setting you can prevent spoofing of IP address.

How to add an iSCSI initiator to the vSphere ESXi

Before adding a new iSCSI initiator here are some recommendations.

  • Make sure that the host recognizes LUNs at start-up.
  • SCSI controller driver in the guest operating system should have a large queue. for Windows OS increase the value of the SCSI Timeout value parameter to tolerate delays I/O resulting from path failover.
  • Configure your environment to have only one VMFS datastore for each LUN.

Select on the ESX hosts click configure and click +Add Software Adapter it will pop-up one more windows in the that windows chose “Add Software iSCSI Adopter.

Read more