Delivering Data Science for the Enterprise with Shiny (R) in Kubernetes

Kublr Team | Sep 26, 2017

As organizations grow and mature, so does the organization’s data. Eventually, the data may become too ‘big’ or too complicated for traditional analytics tools. This growth and resulting challenge has led to many well-established artificial intelligence (AI) applications in the marketplace.

The hype about AI technologies is big: everyone is talking about how AI will revolutionize our lives, touch every home, and solve all our problems. But the promise of AI is not always realized. How can your organization make the leap to AI nirvana?

As an enterprise, your infrastructure is likely rather traditional, and your classic service-based solutions are developed and run within a perfectly adjusted workflow that must be maintained. To adopt AI technology, you must find a way to leverage this technology while making the technology compatible with your internal workflows.

In this post, we’ll show you how Shiny (R) can help you adopt and integrate AI technology.

There are two main programming languages for an AI/data analysis: Python and R. Both languages need a wrapping framework (for example, Flask for Python or Shiny for R) for interaction and visualization.

Shiny (R) is powerful and user-friendly tool, which is why we recommend a continuous integration and continuous deployment (CI/CD) approach for a Shiny (R)-in-Kubernetes cloud solution.

Here is a Shiny (R) example:

Delivering Data Science for the Enterprise with Shiny (R) in Kubernetes

This is the big-picture view of a typical Shiny (R) development flow:

Delivering Data Science for the Enterprise with Shiny (R) in Kubernetes

The R project is basically a bunch of scripts. We must add these scripts to the Docker Image, and either configure a connection to a database or mount a disk volume with data to analyze.

Next, we run unit tests to ensure they are working properly. We’ve chosen Jenkins because it’s already a part of the technology stack in our project. (We use Jenkins to build and deliver developed services.)

Here’s how this process works:

Delivering Data Science for the Enterprise with Shiny (R) in Kubernetes

To implement this scenario, we will:

Build a Docker Image with Jenkins, Docker, and Kubernetes Control Bundled.
Build a Shiny (R) Docker Image.
Deploy Shiny (R) and Jenkins in Kubernetes.
Configure Jenkins.

Build a Docker Image with Jenkins, Docker, and Kubernetes Control Bundled

We have created a Docker file from the official Jenkins Docker Image (jenkins/jenkins:lts).

Inside the Kubernetes cluster, you are basically in a Docker-inside-Docker situation. Install the Docker CE and pass-through /var/run/docker.sock from Kubernetes (so we share the same Docker Agent).

You must also install Kubectl to control the Kubernetes cluster. To do this, place the Kubernetes “config” file into the user’s home directory in order to have access to the cluster without any additional setup.

The resulting Docker file for our Jenkins will look like this:

FROM jenkins/jenkins:lts

EXPOSE 8080 50000

USER root

# Install prerequisites for Docker
RUN apt-get update && apt-get install -y sudo iptables libsystemd-journal0 init-system-helpers libapparmor1 libltdl7 libseccomp2 libdevmapper1.02.1 && rm -rf /var/lib/apt/lists/*

ENV DOCKER_VERSION=docker-ce_17.03.0~ce-0~ubuntu-trusty_amd64.deb
ENV KUBERNETES_VERSION=v1.6.6

# Set up Docker
RUN wget https://download.docker.com/linux/ubuntu/dists/trusty/pool/stable/amd64/$DOCKER_VERSION
RUN dpkg -i $DOCKER_VERSION

# Set up Kubernetes
RUN curl -LO https://storage.googleapis.com/kubernetes-release/release/$KUBERNETES_VERSION/bin/linux/amd64/kubectl
RUN chmod +x ./kubectl
RUN mv ./kubectl /usr/local/bin/kubectl

# Configure access to the Kubernetes Cluster
ADD install/config ~/.kube

ENTRYPOINT ["/bin/tini", "--", "/usr/local/bin/jenkins.sh"]

Build a Shiny (R) Docker Image

You can speed up the building process of new Docker Images by inheriting from the base Docker Image (with just Shiny (R), named “shiny-r”) and the deployment Docker Image (inherited from the base one and named “shiny-r-bundle”).

For the base Docker Image, we use a CentOS 6.6 because we plan to use the Shiny Server for a CentOS 6. Our next steps are:

Install R (from yum repository).
Install database access libraries (unixodbc, freetds).
Override some options in the .Rprofile file.
Install the required R packages: shiny, markdown (for reporting), testthat (for unit tests).
Install ‘RODBC’ R package (for database access).
Install an open-source version of the Shiny Server.

The base Dockerfile for Shiny (R) is:

FROM centos:6.6

RUN rpm -Uvh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm
RUN yum -y install yum-plugin-ovl
RUN yum -y install R
RUN yum -y install unixodbc freetds

RUN echo "options(repos = c(CRAN = \"https://cran.rstudio.com\"))" > ~/.Rprofile

RUN R -e "install.packages('shiny')"
RUN R -e "install.packages('rmarkdown')"
RUN R -e "install.packages('testthat')"
RUN R -e "install.packages('RODBC', type = 'source')"
RUN R -e "sessionInfo()"

ENV R_STUDIO_VERSION=shiny-server-1.5.4.858-rh6-x86_64.rpm
RUN curl https://download3.rstudio.org/centos6.3/x86_64/$R_STUDIO_VERSION > $R_STUDIO_VERSION
RUN yum clean all; yum -y install --nogpgcheck $R_STUDIO_VERSION.

EXPOSE 3838

CMD ["/opt/shiny-server/bin/shiny-server"]

In order for the Shiny to work, install the ‘shiny’ and ‘markdown’ packages. ‘Testthat’ will be used for unit testing, and ‘RODBC’ will be used for accessing traditional RDBMS.

To deploy the Shiny (R) project, create the inherited Docker Image, which will also conduct unit tests during the build:

FROM some-registry.com/shiny-r:latest

COPY config/* /usr/local/etc/

RUN rm /srv/shiny-server/index.html
RUN rm -rf /srv/shiny-server/sample-apps
COPY install /srv/shiny-server

RUN echo; echo "TESTS:"; \
    for file in $(find /srv/shiny-server -name TEST_RUNNER.R);  \
    do  \
      echo "     --------- RUN TEST: $file -------------"; \
      cd $(dirname "$file"); \
      R --quiet --no-save < $file; \
    done

CMD ["/opt/shiny-server/bin/shiny-server"]

Supply test runner scripts for every R project (named ‘TEST-RUNNER.R’), and execute these scripts during the build of the Docker Image.

Example of “TEST_RUNNER.R”:

library('testthat')
source('server.R')
test_dir('tests', reporter = 'Summary')

This test runner verifies ‘server.R’ executing all testing scripts in the directory ‘tests’.

Example of an R unit test (‘test/test_server.R’):

expect_that(Fibonacci(-1), throws_error())
expect_that(Fibonacci(0) == 0, is_true())
expect_that(Fibonacci(1) == 1, is_true())
expect_that(Fibonacci(2) == 1, is_true())
expect_that(Fibonacci(10) == 55, is_true())

Deploy Shiny (R) and Jenkins in Kubernetes

To deploy Shiny (R) and Jenkins into the Kubernetes cluster, we must supply deployment and service definitions:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: shiny-r
  namespace: default
  labels:
    app: shiny-r
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: shiny-r
    spec:
      containers:
      - name: shiny-r
        imagePullPolicy: Always
        image: some-registry.com/shiny-r-bundle:latest
        ports:
        - containerPort: 3838
        resources:
          limits:
            cpu: 100m
            memory: 4Gi
          requests:
            cpu: 100m
            memory: 4Gi
        readinessProbe:
          httpGet:
            path: /
            port: 3838
        volumeMounts:
        - name: shiny-r-storage
          mountPath: /opt/shiny
      volumes:
      - name: shiny-r-storage
---
apiVersion: v1
kind: Service
metadata:
  name: shiny-r-service
  namespace: default
  labels:
    app: shiny-r
spec:
  type: LoadBalancer
  ports:
    - port: 3838
  selector:
    app: shiny-r

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
 name: jenkins-ci
spec:
 replicas: 1
 template:
 metadata:
 labels:
 name: jenkins-ci
 spec:
containers:
 - name: jenkins-ci
 imagePullPolicy: Always
 image: some-registry.com/jenkins-ci:latest
 ports:
 - containerPort: 8080
 - containerPort: 50000
 readinessProbe:
 tcpSocket:
 port: 8080
 initialDelaySeconds: 40
 periodSeconds: 20
 securityContext: 
 privileged: true 
 volumeMounts: 
 - mountPath: /var/run
 name: docker-sock 
 volumes: 
 - name: docker-sock
 hostPath: 
 path: /var/run
---
apiVersion: v1
kind: Service
metadata:
 name: jenkins-ci-lb
spec:
 type: LoadBalancer
 ports:
 - name: jenkins
 port: 8080
 targetPort: 8080
 - name: jenkins-agent
 port: 50000
 targetPort: 50000
 selector:
 name: jenkins-ci

Note:

You will need one Jenkins node for very low hardware requirements. Two 2 CPU cores and 512M RAM should be sufficient.
Determine the number of Shiny (R) instances based on the number of active users and high-availability requirements. Shiny (R) nodes should not starve on computing power: ensure you provide maximum available CPU cores and enough RAM.
In case of the multiple Shiny (R) nodes, you must use a “sticky sessions” in your load balancing (for example, using the Ingres Service instead of the LoadBalancer service).

Configure Jenkins

Make the following two low-level configurations before starting to work with Jenkins in Kubernetes:

Set parameter excludeClientIPFromCrumb=true in the file /var/jenkins_home/config.xml to fix a “No valid crumb was included in the request” error.
Do a “docker login” so Jenkins’ Docker should login and use the correct Docker Registry.

Now we can create the new Jenkins project:

Delivering Data Science for the Enterprise with Shiny (R) in Kubernetes

Next, we add the build parameter:

Delivering Data Science for the Enterprise with Shiny (R) in Kubernetes

Then we configure access to the source code repository:

Delivering Data Science for the Enterprise with Shiny (R) in Kubernetes

And, finally, we add a build step:

cp -r r/* $ROOT_DOCKER/install

#Docker build and publish
DOCKER_NAME="shiny-r-bundle"
DOCKER_IMAGE="some-registry.com/$DOCKER_NAME:$BUILD_NUMBER"
cd $ROOT_DOCKER
docker pull some-registry.com/shiny-r
docker build -t $DOCKER_NAME .
docker tag $DOCKER_NAME $DOCKER_IMAGE
docker push $DOCKER_IMAGE

#Kubernetes redeploy
kubectl set image deployment/shiny-r shiny-r=$DOCKER_IMAGE

In this script, we copy R script files, rebuild the ‘shiny-r-bundle‘ Docker Image, and rollout a new Docker Image in the Kubernetes cluster.

Conclusion

Let’s take a look at the results of our efforts. In this example, we’ve set up git polling for every minute (cron expression ‘* * * * *’), and we have committed some code changes to this repository.

Delivering Data Science for the Enterprise with Shiny (R) in Kubernetes

Redeploying in one minute is a great result for a CI/CD, giving us a large time reserve for the rigorous unit tests. Check out our on Running Spark with Jupyter Notebook & HDFS on Kubernetes to learn more about running data science worklaod on Kubernetes.

Interested in simplifying the management of your Kubernetes cluster? Consider using a Kubernetes management platform such as Kublr.