Upgrading is new versioning of an existing model that add or modify the modules of an already defined package. Here we upgrading the Ubuntu Trusty to a newer version.

Upgrading

“fpm” is used for package management that build backage.

the fpm command is resides in a file

Syntax for fpm

fpm -s -t [options]

the option will deciede our package configuration

Options:

 -t OUTPUT_TYPE
 the type of package you want to create (deb, rpm, solaris, etc)
-s INPUT_TYPE
the package type to use as input (gem, rpm,python, etc)

Check the Dependencies of upgrading version is it compactable with newer version.

-d, –depends DEPENDENCY
A dependency. This flag can be specified multiple times. Value is usually in the form of: -d 'name' or -d 'name > version'

for example Ubuntu Trusty supports with ruby version 2.0, but the Ubuntu Jessie supports with ruby version 2.1. so we have to change of all dependencies.

then add the beforeupgrade and afterupdgrade process script file.

–after-upgrade FILE –>
A script to be run after package upgrade. If not specified, --before-install, --after-install, --before-remove, and --after-remove wil behave in a backwards-compatible manner(they will not be upgrade-case aware).

--before-upgrade FILE  -->

A script to be run before package upgrade. If not specified, --before-install, --after-install, --before-remove, and --after-remove wil behave in a backwards-compatible manner (they will not be upgrade-case aware).


  Currently only supports deb and rpm packages.

the FILE contains the script for the pre upgrade and post upgrade configuration.

before upgrade script does what are the processes to be perform before upgrading the package.

before upgrading we have to ask to the user whether he/she continue with the existing configuration settings or change the new configuration settings.

If he allows to new configuration settings we have to copy the new configuration setting file to appropriate location.

for example the pre upgrade file has the blow scripts

#!/bin/sh
count="0";
t="3";
while [ "$count" -ne "$t" ];
 do
  count=`expr $count + 1`
  echo -n "Do u want to continue the set new configuration file  [Y/N] :"
  	read x
  		if [ $x = y ]; then
   	cp /your conf file path/conf-file  /var/lib//conf-file
   echo "new configureation file copied...."
   break
  fi
  	if [ "$count" -eq "3" ]; then
 	echo "more than three attempt,so it keep the old configuration file"
 	break
  	else
	 echo "enter valid input "
  	fi
done

after upgrade script does what are the processes to be perform after upgrading the package.

after upgrading start your service .

systemctl enable servicefile.service
systemctl start servicefile.service

An Introduction to Urknall

Urknall is Go based automation provising tools for the administration of complex infrastructure developed in Golang. Agentless tool that only relies on common UNIX tools and provides decent caching.

It provides template machanisms that helps us to reuseablity and Urknall provides some basic templates, but lets users modify those or add new ones to solve the specific problem at hand.

It provides the benefits of a compiler, helping to catch bugs early and making refactoring easy and single binary infrastructure management tools, having no dependencies.

Urknall library part of urknall provides the core mechanisms to execute commands on a remote host.

It provides four kind of interfaces to handle the commands. Commands in the sense of shell commands.

About Urknall machanisms
Commands Interface - to run shell commands on target
Logger Interface - to simplify the logging outputs, to track commands which is executed.
Renderer Interface   - to use it’s properties in the command strings using go’s templating.

Validator Interface  - to do more complex validations
Packages - Packages are an strictly internal data-structure.
Tasks - Tasks are ordered collections of commands.

Templates -  to define the list of tasks that should be performed during provisioning.

Targets - where the commands are executed on remote host or local

	Remote Target - uses SSH to connect to the remote machine
	Local Target - to provision the local host.
	Sudo Without Password - it is required that the user is allowed sudo without password. this done by

echo "username ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/90-nopassword
Urknall binary

Urknall binary helps managing projects that use the library. While the urknall library provides the handling of targets, tasks and caching.

Its fully integrated with libraries i.e. the implementations of some basic commands and templates were part of the library itself.

we can change the templates directly, but have to move the according code to our project manually.

Chef vs Urknall
 **Urknall**
	Ties users to Golang.
	Helping to catch bugs early and making refactoring easy.
	Urknall supports for Linux.
	Easy to learn and deploy.
	Go is more flexible to use both YAML and JSON
    No need any dependecies
Chef
	Ties users to Ruby.
	Larger community, with a large collection of modules and configuration recipes.
	Full support for Linux, Unix, Windows.
	Not as easy to learn and deploy. and Documentation still needs a lot of work.
	Relies on JSON which is not as friendly as YAML.

Is it not an uncommon fact that cassandra is the defacto NoSQL database that is being used in the bigdata world at the moment. It is known for its ease and performance, and the constant push that is being given by DataStax in building the community. But there is this new kid in town, a very powerful kid, he is like one of those kids from spykids. Yes, scyllaDB

Scylla is an open source NoSQL database which is apache cassandra compatible with performance 10x more than cassandra. scylla has been giving out promising results so far with very low latency.

Currently in the 0.16 version, and the GA coming out very soon, this is going to be really interesting to see how this works for a lot of bigdata usecases.

In this blog post we will focus on how to setup scylla and how the data modelling works. If you are already a cassandra expert, just head over to the google forums which is very active and ask your queries. But the documentation does not cover for a non-cassandra folks and I am hoping this will be helpful for those people in particular.

1. Setting it up:

This article here gives the steps in downloading and setting it up.

Few things to note is, all the configuration setups and changes should be done in the yaml file which is in `/var/lib/conf/scylla.yml’

Note: make sure you add SCYLLA_ARGS="--developer-mode true" in your scylla-server file, it will by default look for XFS file system and we need ext

Once you install scylla-server and scylla-jmx,

nodetool status

And it will display,

 Datacenter: datacenter1
 =======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens  Owns    Host ID                               Rack
UN  103.56.92.54  211.48 KB  256     ?       6c78937a-2d1f-4dfa-ad47-98405fbd2eff  rack1

Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless
  • Change listen_address to public IP like the above example

  • Make sure you change the rpc_address if you want to communicate through the CQL native protocol remotely. Other wise you will get the NoHostAvailableException.

  • Change the api_address to the public ip to access the REST API server.(This is really cool and one of the reasons I really like riak)

You have the db all set, it is up and running.

####2. Cassandra compatibility - cqlsh

Scylla has got good cassandra compatibility and there is a healthy set of driver support as well. (Driver status)

To get the CQL shell, type cqlsh in your shell.

######1. Creating a keyspace

CREATE KEYSPACE musiclibrary WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1 };

This creates a keyspace(database in RDBMS world) .

Note: Use RDBMS to compare only upto a point until you understand the data modelling concepts. For designing complex data models, it is good to grasp the cassandra-scylla way of thinking.

USE "musiclibrary";
CREATE TABLE rockmusic(
            band text,
            genre text,
            era int,
            PRIMARY KEY (era)
            );

This creates a table called rockmusic.Let us now insert data

  INSERT INTO rockmusic(band, genre, era) values ('rolling stones', 'rocknroll', 1960);

  INSERT INTO rockmusic(band, genre, era) values ('beatles', 'britpop', 1960);

  INSERT INTO TABLE rockmusic(band, genre, era) values ('thewho', 'rock', 1960);

Thats it. Just do a select * from rockmusic and it will print it out. You can also set multiple primary key and do a Key -> Key -> Value search.

In the next article we will briefly look at data modelling in scyllaDB, we will look at

  • ColumnFamily & SuperColumnFamily
  • How to use phantom the scala-java driver.
  • Using apache spark with scyllaDB

Thats it for now.

Introduction

If you are using 3rd party packages, (packages that you don’t own or control), you will want a way to create a reproducible build every time you build your projects. If you use 3rd party packages directly and the package authors change things, your projects could break. Even if things don’t break, code changes could create inconsistent behavior and bugs.

godep tool is a great step in the right direction for managing 3rd party dependencies and creating reproducible builds.

Downloading Godep

Download godep using go get and make sure your $GOPATH/bin directory is in your PATH.

go get github.com/tools/godep
export PATH=$PATH:$GOPATH/bin

How Godep Works

A godep save command will copy all imported packages in their entirety from your current GOPATH into a vendored workspace folder in ./Godeps/_workspace. A list of those packages will be stored with relevant version information in a master file, Godeps/Godeps.json. This is done not just for the packages your project directly imports but also for any imported by your dependencies.

Using Godep is as simple as prepending your normal Go commands like go test or go build with the godep command. This uses a temporarily extended GOPATH which prioritizes the Godep vendor directory.

saves your GOPATH to the Godep folder

$ godep save ./…  

builds using the Godep vendored dependencies

$ godep go build ./…

tests using the Godep vendored dependencies

$ godep go test ./…

From here, should you apply a change to your GOPATH, your project will be isolated.

update a dependency

go get -u github.com/golang/protobuf/…

build using standard GOPATH

$ go build ./…

build using Godep vendored version

$ godep go build ./…

godep update versus godep save

godep update takes a specific dependency package and updates the vendored instance of that package with the version in your GOPATH. This will update files with changes, add new files, remove old ones, and update the version SHA listed in the Godeps.json file.

$ go get -u github.com/golang/protobuf/…
$ godep update github.com/golang/protobuf/…

This will not add or remove sub-packages from dependency management, nor will it update any other dependencies recursively. Only previously imported packages are listed in the Godeps.json file and only those listed are updated.

Updating the entire package will update any references to sub-packages; however no new packages will be added, nor old ones removed. Similarly, if your dependency update is dependent upon another change elsewhere in your dependency stack, you may run into issues. godep update only touches the packages listed in Godeps.json, which match the provided package pattern.

In contrast, godep save applies the entire relevant GOPATH to the Godeps folder and will add/remove packages as needed. Because it’s based off of your GOPATH, godep save can also check for build errors and non-clean repositories before applying changes, enforcing dependency cohesion.

Given the dangers of using godep update (missing packages and dependencies), it’s much safer to use godep save. The only situation where it’s safe to use godep update are when both of these conditions are satisfied:

No dependencies of your target dependency need to be updated.

No imports were added to or removed from your target dependency.

If the dependency is external to your organization, it can be difficult to determine what changes are taking place, so it is safer to never use godep update on anything third-party.

When to Use Godep

Not every project may require the broad dependency control as provided by Godep.

Unlike import path-based vendoring, Godep vendors the entire set of dependencies regardless of a specific desire to version them. This does mean that no dependencies will ever be updated unless explicitly altered, first through GOPATH and then through Godep.

Should your organization have a large number of common dependencies across different projects, you may want to look into using a forked dependency model. Godep provides a locally controlled, customizable dependency management system. When used with care, this system can support highly versioned and reproducible builds, especially in change resistive environments with few shared dependencies.

Spark-jobserver is a really cool RESTful interface for submitting and managing Apache Spark jobs, jars, and job contexts. At megam our analytics platform Meglytics is powered by apache spark and we leverage spark-jobserver to execute spark jobs. This blog post we will see how to get started with apache spark jobserver. Before we go ahead, a big thanks to the Ooyala folks for making the spark-jobserver opensource. Lets get started.

Note: Make sure you have spark installed locally

1. Running spark-jobserver

For sanity check…

 $sudo apt-get update

Now clone the [spark-jobserver] project

  $git clone https://github.com/spark-jobserver/spark-jobserver

To run it,

  $export VER=`sbt version | tail -1 | cut -f2`
  >reStart

Your dev setup is done, fire up your browser and point it to localhost:8090 and you can see a not-so-quagmire kinda UI.

Note: For proper deployment you can find the conf and scripts here

2. Building and deploying a jar:

The fundamental steps in setting up and working in SJS is that,

  • First build jar(like duh!) with your sparkContext(s) and you push it to SJS where your spark Master is also running(in our case, it is local).

  • Then run the jar by providing the classPath and the name of the jar.

Let us look at the simple wordCount example for now to get all the missing pieces together.

cd into spark-jobserver and run this,

sbt job-server-tests/package

They are examples that you can find here. Now your wordcount example is built. Lets push the jar to SJS /jar API

 curl --data-binary @job-server-tests/target/scala-2.10/job-server-tests.jar localhost:8090/jars/firsttest

3. Submitting a job:

Lets run it and get the output,

curl -d "input.string = a b c a b see" 'localhost:8090/jobs?appName=test&classPath=spark.jobserver.WordCountExample'

We send a request to /jobs api with the appName and classPath. Upon every job submission SJS gives you an jobID ` “jobId”: “5453779a-f004-45fc-a11d-a39dae0f9bf4”`

4. Getting the status of the job:

Call the /jobs api with the key to get the status/result and also the duration of your job. Also, fire up the spark master UI to see the job getting exectuted.

curl localhost:8090/jobs/5453779a-f004-45fc-a11d-a39dae0f9bf4

SJS is a really nice project which makes a ton easy to work with apache spark and the production cases looks promising aswell. There is also a gitter chat room where all the SJS folks hang out and solve any kind of queries.

Thats it for now. If I find time I will write about spark-jobserver in production and using sqlContext and dataframes with spark-jobserver. Any questions regarding spark-jobserver comment below or shoot me an email.