Blossoms: 2020

Wednesday, September 30, 2020

Create Pipelines for AI DevOps

Q1. What are pipelines ?

A1.

A circular hose .. which runs miles and carries water, oil, waste water .. ehh ??? Okay let me re-phrase .. what are DevOps Pipelines ?? Aah

DevOps Pipelines are a flow methodology which adhere to certain operations in a serial manner.

A DevOps pipeline is a set of practices that the development (Dev) and operations (Ops) teams implement to build, test, and deploy software faster and easier. One of the primary purposes of a pipeline is to keep the software development process organized and focused.

Actually Pipeline seems a bit misleading here .. unless pipes are meant to be broken and junctions can be fitted .. an assembly line would be apt ! But yeah AI DevOps Assembly Line wont rhyme good ;)

This also means that similar to Software Development, we have a continuously running an improving assembly line. At each stage of an assembly line (chassis making, fitting engines, fitting gearbox, painting etc), each task is refined to be the best. Its then assembled and put in a pipelined manner to ensure the best product in the end.

Q2. So how many tasks (like chassis making, fitting engines, fitting gearbox, painting etc) are you planning to do here ?

A2. Well an analogy for me here would be :

a) Developer writes a ML code - Unit tests it.
b) ML code again needs to get a load of data (read training set) - Step a) unit test takes a small subset.
c) Dockerize the code along with all dependencies and create a container.
d) By dependency I would include the data set (It could again be imported from Keras)
e) Deploy the container onto Kubeflow.

When you develop and deploy an ML system, the ML workflow typically consists of several stages. Developing an ML system is an iterative process. You need to evaluate the output of various stages of the ML workflow, and apply changes to the model and parameters when necessary to ensure the model keeps producing the results you need.

In our experiment, we would be creating a Model for identifying fashion apparels. The training data set would be loaded from Keras (classification.py), this would create a Model [mymodel.h5] and another python program replicating production environment would compare its test data against this Model and derive an intelligent output.

Q3. Now what on earth is that ? Kubeflow ?

A3. Kubeflow is a free and open-source machine learning platform designed to enable using machine learning pipelines to orchestrate complicated workflows running on Kubernetes.

Q4. Very well .. are we in an era of Ekta Kapoor .. what next "Kyunki Saab Bhi Kabhi Bahu Thi." ? Why is everything starting with "K" .. what is Kubernetes ?? Keras ??

A4. Kubernetes (commonly stylized as K8s) is an open-source container-orchestration system for automating computer application deployment, scaling, and management.

Keras is an open-source neural-network library written in Python.

Great now .. "Kuch kuch samajh aata hain :)"

Its like .. say I have had a laptop with Ubuntu OS running. (aka laptop -Kubernetes replacing the hardware with a virtual hardware) and then I have a software process called Kubeflow which would help spawning the multiple K8 instances to spawn or realize multiple deployements.

Keras just came in-between to help develop code in ML for Python.

This is how my folder structure would look like :

More interested in the orange boxed files.

Pipeline/pipeline2/component1/tf/classification.py

from __future__ import absolute_import, division, print_function, unicode_literals

# TensorFlow and tf.keras

import tensorflow as tf

from tensorflow import keras

# Helper libraries

import numpy as np

import matplotlib.pyplot as plt

#Printing the Tensorflow version

print(tf.__version__)

# Getting fashion mnist dataset from Keras

# Splitting datasets into Test and Train datasets

fashion_mnist = keras.datasets.fashion_mnist

(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

#Labelling the datasets

class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',

'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

train_images.shape

....

# Grab an image from the test dataset.

img = test_images[1]

# Add the image to a batch where it's the only member.

img = (np.expand_dims(img,0))

predictions_single = model.predict(img)

plot_value_array(1, predictions_single[0], test_labels)

_ = plt.xticks(range(10), class_names, rotation=45)

np.argmax(predictions_single[0])

#Saving model

model.save('mymodel.h5')

predictions[9]

np.argmax(predictions[9])

Pipeline/pipeline2/component1/Dockerfile

FROM tensorflow/tensorflow:nightly-py3-jupyter
RUN apt-get update &&\
apt-get upgrade -y &&\
apt-get install -y git && \
pip install matplotlib

COPY . .

ENV ip "XX.XX.XXX.XXX"           // Replace with your VM IP, where the local GitLab ought to run.
ENV user "root"                                  // Replace with user (else root is you use public-keys) [More on how to create root user below]
ENV passwd "xxxxx"                          // Replace with the user password or root passwd

CMD /usr/bin/git config --global user.name "root" && \
/usr/bin/git config --global user.email "rajesh.bhaskaran@aricent.com" && \
/usr/bin/git clone http://${user}:${passwd}@${ip}:30080/${user}/tensorflow.git && \
/usr/bin/python3 ./tf/classification.py && cd tensorflow/ && cp ../mymodel.h5 . && \
/usr/bin/git pull --allow-unrelated-histories && \
/usr/bin/git add mymodel.h5 && \
/usr/bin/git commit "-m" "Commit 1.0" && \

/usr/bin/git push http://${user}:${passwd}@${ip}:30080/${user}/tensorflow.git

>> [More on how to create root user here]

rajeshb@AzureUbuntu:~$ sudo passwd // This would create and set password for a root user

Enter new UNIX password:

Retype new UNIX password:

passwd: password updated successfully

This dockerizes the program into a container and also has the model -> mymodel.h5 file created thus pushed in the GIT Lab repo.

Pipeline/pipeline2/component2/tf/prediction.py

from __future__ import absolute_import, division, print_function, unicode_literals

import os

import tensorflow as tf

from tensorflow import keras

#Load model

new_model = tf.keras.models.load_model('mymodel.h5')

fashion_mnist = keras.datasets.fashion_mnist

# Load Inference data

(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

train_images = train_images / 255.0

test_images = test_images / 255.0

# Getting model metrics

test_loss, test_acc = new_model.evaluate(test_images, test_labels, verbose=2)

#Print Test accuracy

print('\nTest accuracy:', test_acc)

Pipeline/pipeline2/component2/Dockerfile

FROM omie12/image-class:5.0

RUN pip install -q pyyaml h5py && \

rm -rf tensorflow

COPY . .

# Settings same as the above Dockerfile

ENV ip "xx.xx.xx.xxx"

ENV user "xxx"

ENV passwd "xxxxxx"

CMD /usr/bin/git clone http://${user}@${ip}:30080/${user}/tensorflow.git && cd tensorflow && ls && /usr/bin/python3 ../tf/prediction.py

Setting the other components of this assembly line :

1. GIT Lab : Running it locally from Kubernetes

The GitLab Docker images are monolithic images of GitLab running all the necessary services in a single container.

Following commands allows running a local instance of GitLab :

rajeshb@AzureUbuntu:~$ sudo docker run --detach --name gitlab --hostname localhost --publish 30080:30080 --publish 30022:22 --env GITLAB_OMNIBUS_CONFIG="external_url 'http://localhost:30080'; gitlab_rails['gitlab_shell_ssh_port']=30022;" gitlab/gitlab-ce:9.1.0-ce.0

>> Open GIT Lab local on : http://VM-IP:30080
>> Click on New Project to create a project named tensorflow (in our case)

>> Create and commit a README file in the new project.

2. Jupyter Notebook Server

Jupyter is a open-source service for interactive computing across dozens of programming languages". Spun off from IPython in 2014 by Fernando Pérez, Project Jupyter supports execution environments in several dozen languages. We can use Jupyter to run the Python script to generate the pipeline yaml file.

Jupyter is part of the Kubeflow platform. The below 2 steps should make it up and running :

rajeshb@AzureUbuntu:~/kubeflow$ pwd

/home/rajeshb/kubeflow

rajeshb@AzureUbuntu:~/kubeflow$ sudo minikube start --vm-driver none

> Minikube should start successfully !!

rajeshb@AzureUbuntu:~/kubeflow$ kfctl apply -V -f kfctl_k8s_istio.v1.0.2.yaml

> All YAML processes should be deployed successfully !!!

> INFO[0014] Applied the configuration Successfully! < -- Should be seen

Possible errors : Permission denied

Solution : rajeshb@AzureUbuntu:~/kubeflow$ sudo chown -R $USER:$USER ~/.minikube/

This would set the ownership of the hidden folder .minikube to be under self user.

Step 1 : Create the Pipeline

> Create a Jupyter Notebook to run the Python code. Open the Kubeflow Dashboard

Launch the Notebook with the above configurations

#pip install kfp==1.0.1 --user

import kfp

from kubernetes.client.models import V1EnvVar

@kfp.dsl.component

def my_component1():

...

return kfp.dsl.ContainerOp(

name='<docker image name>', // Component 1 docker name

image='<docker image name>:<tag>' // Component 1 docker name:tag

)

@kfp.dsl.component

def my_component2():

...

return kfp.dsl.ContainerOp(

name='<docker image name>', // Component 2 docker name

image='<docker image name>:<tag>' // Component 2 docker name:tag

)

#Defining the pipeline that uses the above components.

@kfp.dsl.pipeline(name='pipeline2', description='mnist tf model')

def model_generation():

component1 = (my_component1()

.add_env_variable(V1EnvVar(name = 'ip', value = 'xx.xx.xxx.xx')) // VM IP

.add_env_variable(V1EnvVar(name = 'user', value = 'xxxx' )) // VM User / root

.add_env_variable(V1EnvVar(name = 'passwd', value = 'xxxxxx' )) // VM User / root password

)

component1.container.set_image_pull_policy("IfNotPresent")

component2 = my_component2().after(component1)

component2.container.set_image_pull_policy("IfNotPresent")

pipeline_func = model_generation

pipeline_filename = pipeline_func.__name__ + '.pipeline2.zip'

#Compile the pipeline and export it

kfp.compiler.Compiler().compile(model_generation,pipeline_filename)

In case of missing package kfp, uncomment and run

>> pip install kfp==1.0.1 --user

--Inside 1 dialog in the start--

This would generate the pipeline YAML file in a .zip format in the Jupyter Home page

Download the same. The model_generation.pipeline2.zip needs to be uploaded to the Kubeflow Dashboard --> Pipelines

Choose file -- Select model_generation.pipeline2.zip from the Download folder

>> Click on Create Pipeline

-- Pipeline gets created in the Pipeline Homepage

>> Click on the created Pipeline

>> Create Run -- This would execute the Pipeline

>> Logs can be seen in the web console.

Wednesday, September 16, 2020

Steps to create an Ubuntu VM in Azure

1. Create a free account on Azure by creating a profile using the below URL

Ø https://azure.microsoft.com/en-in/free/

Ø Click on “Start Free” as below

Ø Create your profile using your mail-ID

Image 1

2. On the Azure portal, create a new virtual machine

Image 2

3. Create a new Virtual Machine as below

Image 3

4. Click on “Select size” to open up Image 5

Image 4

Search for “D4ds_v4” as the image

>> To get Premium SSD

5. Choose default configurations under “Create new disk”

Image 9

10. Choose default values and click “Next: Networking”

Image 10

11. Choose default values and click “Next: Networking”

Image 11

12. Default values and click “Next for Advanced and Tags”

13. Click on “Create” on the final screen

14. Click on “Generate new key pair” à Download private key and create resource

Image ‘a’

15. This downloads a “.pem file” with the name of the OS image

16. Open Putty Gen to Load and Save this .pem key as a private key (Do not click on “Generate”)

Image ‘b’

17. Select Load and choose “All Files” and select the .pem that was downloaded earlier.

Image ‘c’

18. Press “Save Private Key” and save the key on the desktop.

19. Get the Public IP from the Azure Console

20. Create a Putty session with the Public IP

21. Add the SSH à Auth Key as below

22. Use the login name was created in Azure Console

Software Installations

1. Docker Installation

Ø https://phoenixnap.com/kb/how-to-install-docker-on-ubuntu-18-04

1) sudo apt-get update

2) sudo apt-get remove docker docker-engine docker.io

3) sudo apt autoremove

4) sudo apt install docker.io

5) sudo apt-get update

6) sudo apt-get remove docker docker-engine docker.io

7) sudo apt install docker.io

8) sudo systemctl start docker

9) sudo systemctl enable docker

rajeshb@AzureUbuntu:~$ docker --version

Docker version 19.03.6, build 369ce74a3c

2. Kubectl installation

Ø https://kubernetes.io/docs/tasks/tools/install-kubectl/

rajeshb@AzureUbuntu:~$ curl -LO "https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl"

% Total % Received % Xferd Average Speed Time Time Time Current

Dload Upload Total Spent Left Speed

100 41.0M 100 41.0M 0 0 245M 0 --:--:-- --:--:-- --:--:-- 244M

rajeshb@AzureUbuntu:~$ chmod +x ./kubectl

rajeshb@AzureUbuntu:~$ sudo mv ./kubectl /usr/local/bin/kubectl

rajeshb@AzureUbuntu:~$ kubectl version --client

Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.1", GitCommit:"206bcadf021e76c27513500ca24182692aabd17e", GitTreeState:"clean", BuildDate:"2020-09-09T11:26:42Z", GoVersion:"go1.15", Compiler:"gc", Platform:"linux/amd64"}

3. MINIKUBE Installation

Ø https://kubernetes.io/docs/tasks/tools/install-minikube/

curl -Lo minikube https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 && chmod +x minikube

rajeshb@AzureUbuntu:~$ sudo mkdir -p /usr/local/bin/

rajeshb@AzureUbuntu:~$ sudo install minikube /usr/local/bin/

rajeshb@AzureUbuntu:~$ sudo minikube version

minikube version: v1.13.0

commit: 0c5e9de4ca6f9c55147ae7f90af97eff5befef5f-dirty

rajeshb@AzureUbuntu:~$ sudo minikube start --vm-driver none

* minikube v1.13.0 on Ubuntu 18.04

* Using the none driver based on user configuration

X Exiting due to GUEST_MISSING_CONNTRACK: Sorry, Kubernetes 1.19.0 requires conntrack to be installed in root's path

rajeshb@AzureUbuntu:~$ sudo apt-get install conntrack

rajeshb@AzureUbuntu:~$ sudo minikube start --vm-driver none

rajeshb@AzureUbuntu:~$ sudo chown -R $USER $HOME/.kube $HOME/.minikube

rajeshb@AzureUbuntu:~$ sudo minikube status

rajeshb@AzureUbuntu:~$ mkdir kubeflow

rajeshb@AzureUbuntu:~$ wget https://github.com/kubeflow/kfctl/releases/download/v1.0.2/kfctl_v1.0.2-0-ga476281_linux.tar.gz -O $PWD/kubeflow/kfctl_linux.tar.gz

wget https://raw.githubusercontent.com/kubeflow/manifests/v1.0-branch/kfdef/kfctl_k8s_istio.v1.0.2.yaml -O $PWD/kubeflow/kfctl_k8s_istio.v1.0.2.yaml

rajeshb@AzureUbuntu:~$ cd kubeflow

rajeshb@AzureUbuntu:~$ ls -lrt

rajeshb@AzureUbuntu:~$ vim ~/.bashrc

---

export PATH=$PATH:~/kubeflow/

---

rajeshb@AzureUbuntu:~/kubeflow$ kfctl apply -V -f kfctl_k8s_istio.v1.0.2.yaml

rajeshb@AzureUbuntu:~/kubeflow$ sudo kubectl get pod -n kubeflow

Open the URL again with http://<Public_IP>:31380