Q1. What are pipelines ?
A1.
A circular hose .. which runs miles and carries water, oil, waste water .. ehh ??? Okay let me re-phrase .. what are DevOps Pipelines ?? Aah
DevOps Pipelines are a flow methodology which adhere to certain operations in a serial manner.
A DevOps pipeline is a set of practices that the development (Dev) and operations (Ops) teams implement to build, test, and deploy software faster and easier. One of the primary purposes of a pipeline is to keep the software development process organized and focused.
Actually Pipeline seems a bit misleading here .. unless pipes are meant to be broken and junctions can be fitted .. an assembly line would be apt ! But yeah AI DevOps Assembly Line wont rhyme good ;)
This also means that similar to Software Development, we have a continuously running an improving assembly line. At each stage of an assembly line (chassis making, fitting engines, fitting gearbox, painting etc), each task is refined to be the best. Its then assembled and put in a pipelined manner to ensure the best product in the end.
Q2. So how many tasks (like chassis making, fitting engines, fitting gearbox, painting etc) are you planning to do here ?
A2. Well an analogy for me here would be :
a) Developer writes a ML code - Unit tests it.
b) ML code again needs to get a load of data (read training set) - Step a) unit test takes a small subset.
c) Dockerize the code along with all dependencies and create a container.
d) By dependency I would include the data set (It could again be imported from Keras)
e) Deploy the container onto Kubeflow.
When you develop and deploy an ML system, the ML workflow typically consists of several stages. Developing an ML system is an iterative process. You need to evaluate the output of various stages of the ML workflow, and apply changes to the model and parameters when necessary to ensure the model keeps producing the results you need.
In our experiment, we would be creating a Model for identifying fashion apparels. The training data set would be loaded from Keras (classification.py), this would create a Model [mymodel.h5] and another python program replicating production environment would compare its test data against this Model and derive an intelligent output.
Q3. Now what on earth is that ? Kubeflow ?
A3. Kubeflow is a free and open-source machine learning platform designed to enable using machine learning pipelines to orchestrate complicated workflows running on Kubernetes.
Q4. Very well .. are we in an era of Ekta Kapoor .. what next "Kyunki Saab Bhi Kabhi Bahu Thi." ? Why is everything starting with "K" .. what is Kubernetes ?? Keras ??
A4. Kubernetes (commonly stylized as K8s) is an open-source container-orchestration system for automating computer application deployment, scaling, and management.
Keras is an open-source neural-network library written in Python.
Great now .. "Kuch kuch samajh aata hain :)"
Its like .. say I have had a laptop with Ubuntu OS running. (aka laptop -Kubernetes replacing the hardware with a virtual hardware) and then I have a software process called Kubeflow which would help spawning the multiple K8 instances to spawn or realize multiple deployements.
Keras just came in-between to help develop code in ML for Python.
This is how my folder structure would look like :

More interested in the orange boxed files.
Pipeline/pipeline2/component1/tf/classification.py
from __future__ import absolute_import, division, print_function, unicode_literals
# TensorFlow and tf.keras
import tensorflow as tf
from tensorflow import keras
# Helper libraries
import numpy as np
import matplotlib.pyplot as plt
#Printing the Tensorflow version
print(tf.__version__)
# Getting fashion mnist dataset from Keras
# Splitting datasets into Test and Train datasets
fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
#Labelling the datasets
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
train_images.shape
....
....
# Grab an image from the test dataset.
img = test_images[1]
# Add the image to a batch where it's the only member.
img = (np.expand_dims(img,0))
predictions_single = model.predict(img)
plot_value_array(1, predictions_single[0], test_labels)
_ = plt.xticks(range(10), class_names, rotation=45)
np.argmax(predictions_single[0])
#Saving model
model.save('mymodel.h5')
predictions[9]
np.argmax(predictions[9])
Pipeline/pipeline2/component1/Dockerfile
FROM tensorflow/tensorflow:nightly-py3-jupyter
RUN apt-get update &&\
apt-get upgrade -y &&\
apt-get install -y git && \
pip install matplotlib
COPY . .
ENV ip "XX.XX.XXX.XXX" // Replace with your VM IP, where the local GitLab ought to run.
ENV user "root" // Replace with user (else root is you use public-keys) [More on how to create root user below]
ENV passwd "xxxxx" // Replace with the user password or root passwd
CMD /usr/bin/git config --global user.name "root" && \
/usr/bin/git config --global user.email "rajesh.bhaskaran@aricent.com" && \
/usr/bin/git clone http://${user}:${passwd}@${ip}:30080/${user}/tensorflow.git && \
/usr/bin/python3 ./tf/classification.py && cd tensorflow/ && cp ../mymodel.h5 . && \
/usr/bin/git pull --allow-unrelated-histories && \
/usr/bin/git add mymodel.h5 && \
/usr/bin/git commit "-m" "Commit 1.0" && \
/usr/bin/git push http://${user}:${passwd}@${ip}:30080/${user}/tensorflow.git
>> [More on how to create root user here]
rajeshb@AzureUbuntu:~$ sudo passwd // This would create and set password for a root user
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
This dockerizes the program into a container and also has the model -> mymodel.h5 file created thus pushed in the GIT Lab repo.
Pipeline/pipeline2/component2/tf/prediction.py
from __future__ import absolute_import, division, print_function, unicode_literals
import os
import tensorflow as tf
from tensorflow import keras
#Load model
new_model = tf.keras.models.load_model('mymodel.h5')
fashion_mnist = keras.datasets.fashion_mnist
# Load Inference data
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
train_images = train_images / 255.0
test_images = test_images / 255.0
# Getting model metrics
test_loss, test_acc = new_model.evaluate(test_images, test_labels, verbose=2)
#Print Test accuracy
print('\nTest accuracy:', test_acc)
Pipeline/pipeline2/component2/Dockerfile
FROM omie12/image-class:5.0
RUN pip install -q pyyaml h5py && \
rm -rf tensorflow
COPY . .
# Settings same as the above Dockerfile
ENV ip "xx.xx.xx.xxx"
ENV user "xxx"
ENV passwd "xxxxxx"
CMD /usr/bin/git clone http://${user}@${ip}:30080/${user}/tensorflow.git && cd tensorflow && ls && /usr/bin/python3 ../tf/prediction.py
Setting the other components of this assembly line : 1. GIT Lab : Running it locally from Kubernetes
The GitLab Docker images are monolithic images of GitLab running all the necessary services in a single container.
Following commands allows running a local instance of GitLab :
rajeshb@AzureUbuntu:~$ sudo docker run --detach --name gitlab --hostname localhost --publish 30080:30080 --publish 30022:22 --env GITLAB_OMNIBUS_CONFIG="external_url 'http://localhost:30080'; gitlab_rails['gitlab_shell_ssh_port']=30022;" gitlab/gitlab-ce:9.1.0-ce.0
>> Open GIT Lab local on : http://VM-IP:30080
>> Click on New Project to create a project named tensorflow (in our case)
>> Create and commit a README file in the new project.
2. Jupyter Notebook Server
Jupyter is a open-source service for interactive computing across dozens of programming languages". Spun off from IPython in 2014 by Fernando Pérez, Project Jupyter supports execution environments in several dozen languages. We can use Jupyter to run the Python script to generate the pipeline yaml file.
Jupyter is part of the Kubeflow platform. The below 2 steps should make it up and running :
rajeshb@AzureUbuntu:~/kubeflow$ pwd
/home/rajeshb/kubeflow
rajeshb@AzureUbuntu:~/kubeflow$ sudo minikube start --vm-driver none
> Minikube should start successfully !!
rajeshb@AzureUbuntu:~/kubeflow$ kfctl apply -V -f kfctl_k8s_istio.v1.0.2.yaml
> All YAML processes should be deployed successfully !!!
> INFO[0014] Applied the configuration Successfully! < -- Should be seen
Possible errors : Permission denied
Solution : rajeshb@AzureUbuntu:~/kubeflow$ sudo chown -R $USER:$USER ~/.minikube/
This would set the ownership of the hidden folder .minikube to be under self user.
Step 1 : Create the Pipeline
> Create a Jupyter Notebook to run the Python code. Open the Kubeflow Dashboard

Launch the Notebook with the above configurations


#pip install kfp==1.0.1 --user
import kfp
from kubernetes.client.models import V1EnvVar
@kfp.dsl.component
def my_component1():
...
return kfp.dsl.ContainerOp(
name='<docker image name>', // Component 1 docker name
image='<docker image name>:<tag>' // Component 1 docker name:tag
)
@kfp.dsl.component
def my_component2():
...
return kfp.dsl.ContainerOp(
name='<docker image name>', // Component 2 docker name
image='<docker image name>:<tag>' // Component 2 docker name:tag
)
#Defining the pipeline that uses the above components.
@kfp.dsl.pipeline(name='pipeline2', description='mnist tf model')
def model_generation():
component1 = (my_component1()
.add_env_variable(V1EnvVar(name = 'ip', value = 'xx.xx.xxx.xx')) // VM IP
.add_env_variable(V1EnvVar(name = 'user', value = 'xxxx' )) // VM User / root
.add_env_variable(V1EnvVar(name = 'passwd', value = 'xxxxxx' )) // VM User / root password
)
component1.container.set_image_pull_policy("IfNotPresent")
component2 = my_component2().after(component1)
component2.container.set_image_pull_policy("IfNotPresent")
pipeline_func = model_generation
pipeline_filename = pipeline_func.__name__ + '.pipeline2.zip'
#Compile the pipeline and export it
kfp.compiler.Compiler().compile(model_generation,pipeline_filename)
In case of missing package kfp, uncomment and run
>> pip install kfp==1.0.1 --user
--Inside 1 dialog in the start--
This would generate the pipeline YAML file in a .zip format in the Jupyter Home page
Download the same. The model_generation.pipeline2.zip needs to be uploaded to the Kubeflow Dashboard --> Pipelines
Choose file -- Select
model_generation.pipeline2.zip from the Download folder>> Click on Create Pipeline
-- Pipeline gets created in the Pipeline Homepage
>> Click on the created Pipeline
>> Create Run -- This would execute the Pipeline
>> Logs can be seen in the web console.