Beginning with Cascalog

I have got some project and i am trying out different things .. of course in clojure..

I stumbled upon Cascalog. It is by the same person nathanmarz who has created twitter-storm.

Cascalog is a "querying" library that runs over hadoop. You can set up a hadoop node and query it with your data. Your queries would look up similar ("in philosophy not in syntax") to the SQL queries. So, i thought of giving it a try. Also it would be my first attempt at hadoop too :) Wanna have fun...


I stumbled upon many hadoop tutorials but finally found one send by God and written by Angel - at -
Hadoop-Set-Up

The only problem that you can encounter is when running cascalog that it refuses to connect to the hadoop with ssh failure. Install sshd (ssh daemon) for that with command :
 
           sudo apt-get install openssh-server

After that, it was straight forward with Cascalog library.. i ran it on the REPL and it was working like a charm.. To start with go to : Getting Started with Cascalog

I just want to add that in cascalog :

   defmapcatop is used for partitioning your one row into multiple rows ( speaking in terms of sql).
   The used example has a string of words. You can use a defmapcatop to generate a list of words. This is transposing from horizontal to vertical.

here my bash history about my war :) :

  467  cp shared/hadoop-2.4.1.tar.* hadoop/
  468  ls
  469  ls hadoop/
  470  cd hadoop/
  471  tar -xvf hadoop-2.4.1.tar.gz
  472  tar -xvzf hadoop-2.4.1.tar.gz
  473  ls
  474  emacs &
  475  vim set-env-vars.sh
  476  cat set-env-vars.sh
  477  ls
  478  chmod a+x set-env-vars.sh
  479  ls -l
  480  chmod 755 set-env-vars.sh
  481  ls
  482  ls -l
  483  chmod 766 set-env-vars.sh
  484  ls
  485  ls -l
  486  ls
  487  ln -s /home/XYZ/hadoop/hadoop-2.4.1 hadoop
  488  ls
  489  bash set-env-vars.sh
  490  echo $HADOOP-INSTALL
  491  cat set-env-vars.sh
  492  echo $HADOOP_INSTALL
  493  vim set-env-vars.sh
  494  cat set-env-vars.sh
  495  ./set-env-vars.sh
  496  printev
  497  printenv
  498  printenv $HADOOP_INSTALL
  499  echo $HADOOP_INSTALL
  500  cat set-env-vars.sh
  501  printenv OLDPWD
  502  printenv HADOOP_INSTALL
  503  bash set-env-vars.sh
  504  printenv HADOOP_INSTALL
  505  set $HADOOP_INSTALL
  506  echo $HADOOP_INSTALL
  507  ./set-env-vars.sh
  508  echo $HADOOP_INSTALL
  509  cat set-env-vars.sh
  510  echo HADOOP_INSTALL
  511  $HADOOP_INSTALL
  512  sudo bash set-env-vars.sh
  513  echo $HADOOP_INSTALL
  514  export HADOOP=/home/XYZ/hadoop
  515  ls
  516  echo $HADOOP
  517  ./set-env-vars.sh
  518  cat set-env-vars.sh
  519  echo $HADOOP_INSTALL
  520  source set-env-vars.sh
  521  echo $HADOOP_INSTALL
  522  script help-script.script
  523  cat help-script.script | grep "/default"
  524  # edit the core-site.xml
  525  # edit the yarn-site.xml
  526  # edit the mapred-site.xml
  527  # create namenode and datanode for hadoop
  528  mkdir ../my-store/hdfs/namenode
  529  ls
  530  mkdir my-store
  531  mkdir -r my-store/hdfs/namenode
  532  man mkdir
  533  mkdir -f my-store/hdfs/namenode
  534  mkdir --help
  535  mkdir -p my-store/hdfs/namenode
  536  mkdir -p my-store/hdfs/datanode
  537  # edit the hdfs-site.xml
  538  # format the new hadoop filesystem
  539  hdfs namenode -format
  540  ls
  541  source ~/.bashrc
  542  hdfs
  543  hdfs namenode -format
  544  # formatting of namenode need to be done only once.
  545  # otherwise it would wipe away the data
  546  # do it again before starting hadoop only
  547  start-dfs.sh
  548  ssh-keyget -t rsa -P ''
  549  ssh-keygen -t rsa -P ''
  550  cat ~/.ssh/hadoop_rsa.pub >> ~/.ssh/authorized_keys
  551  start-dfs.sh
  552  jps
  553  which ssh
  554  which sshd
  555  sudo apt-get install sshd
  556  sudo apt-get install openssh-server
  557  # install the openssh-server for sshd i.e. daemon of ssh
  558  which ssh
  559  which sshd
  560  start-dfs.sh
  561  jps
  562  script help-script.script
  563  cat help-script.script
  564  man script
  565  cat ~/.bash_history
  566  cat ~/.bash_history  | less
  567  cat ~/.bash_history  | grep "hadoop"
  568  cat ~/.bash_history  | grep "only"
  569  cat ~/.bash_history  | grep "once"
  570  cat ~/.bash_history  | grep "#"
  571  history -a
  572  history
  573  history | less
  574  man history
  575  history --help
  576  history
  577  history > help-script.script
  578  script -a help-script.script
  579  history > help-script.script
Script started on Tuesday 29 July 2014 05:17:58 PM IST
XYZ@XYZ-VirtualBox: ~/hadoop XYZ@XYZ-VirtualBox:~/hadoop$ start-yar n.sh 
starting yarn daemons
starting resourcemanager, logging to /home/XYZ/hadoop/hadoop-2.4.1/logs/yarn-XYZ-resourcemanager-XYZ-VirtualBox.out
localhost: starting nodemanager, logging to /home/XYZ/hadoop/hadoop-2.4.1/logs/yarn-XYZ-nodemanager-XYZ-VirtualBox.out
]0;XYZ@XYZ-VirtualBox: ~/hadoop XYZ@XYZ-VirtualBox:~/hadoop$ jps
6082 NodeManager
5178 DataNode
4981 NameNode
6117 Jps
5439 SecondaryNameNode
5883 ResourceManager
]0;XYZ@XYZ-VirtualBox: ~/hadoop XYZ@XYZ-VirtualBox:~/hadoop$ exit
exit

Script done on Tuesday 29 July 2014 05:19:56 PM IST
Script started on Tuesday 29 July 2014 05:20:27 PM IST
XYZ@XYZ-VirtualBox: ~/hadoop XYZ@XYZ-VirtualBox:~/hadoop$ watch -n 5 free -m
XYZ@XYZ-VirtualBox/hadoop$ stop-ya rn.sh
stopping yarn daemons
stopping resourcemanager
localhost: stopping nodemanager
no proxyserver to stop
]0;XYZ@XYZ-VirtualBox: ~/hadoop XYZ@XYZ-VirtualBox:~/hadoop$ stop [K-s [Kdfs .sh
14/07/29 17:27:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Stopping namenodes on [localhost]
localhost: stopping namenode
localhost: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
14/07/29 17:28:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
]0;XYZ@XYZ-VirtualBox: ~/hadoop XYZ@XYZ-VirtualBox:~/hadoop$ exit
exit

Script done on Tuesday 29 July 2014 05:30:36 PM IST




// Cacalog bash history : Counting the word example :


Script started on Wednesday 30 July 2014 04:14:01 PM IST
]0;XYZ@XYZ-VirtualBox: ~/clojure/cascalog XYZ@XYZ-VirtualBox:~/clojure/cascalog$ lein repl
nREPL server started on port 50449 on host 127.0.0.1 - nrepl://127.0.0.1:50449
REPL-y 0.3.1
Clojure 1.5.1
    Docs: (doc function-name-here)
          (find-doc "part-of-name-here")
  Source: (source function-name-here)
 Javadoc: (javadoc java-object-or-class-here)
    Exit: Control+D or (exit) or (quit)
 Results: Stored in vars *1, *2, *3, an exception in *e

user=> (use 'cascalog.api) [26G [8G [27G
nil
user=> (use 'cascalog.playground) [33G [8G [34G
nil
user=> (require  '[cascalog.logic.def :as def] [46G [19G [47G) [47G [8G [48G
nil
user=>

user=> (def/defmapcatfn tokenise [line] [39G [34G [40G
  #_=>   (clojure.string/split line #"[\[\] [43G [41G [44G\\\(\) [49G [47G [50G,.) [52G [10G [53G\s+] [56G [39G [57G") [58G [59G) [59G [60G
#'user/tokenise
user=>

user=>

user=> (require '[cascalog.logic.ops :as c] [43G [18G [44G) [44G [8G [45G
nil
user=>

user=>

user=> (?- (stdout) [19G [12G [20G
  #_=>     (<- [?line] [22G [16G [23G
  #_=>         (sentence :> ?line) [34G [16G [35G) [35G [36G) [36G [37G
14/07/30 16:15:09 INFO util.HadoopUtil: using default application jar, may cause class not found exceptions on the cluster
14/07/30 16:15:09 INFO planner.HadoopPlanner: using application jar: /home/XYZ/.m2/repository/cascading/cascading-hadoop/2.5.3/cascading-hadoop-2.5.3.jar
14/07/30 16:15:09 INFO property.AppProps: using app.id: C9B7157B563647F4A5B9DD8C2B4CC9DB
14/07/30 16:15:10 INFO util.Version: Concurrent, Inc - Cascading 2.5.3
14/07/30 16:15:10 INFO flow.Flow: [] starting
14/07/30 16:15:10 INFO flow.Flow: []  source: MemorySourceTap["MemorySourceScheme[[UNKNOWN]->[ALL]]"]["/b7d6055f-3789-487d-946f-4b5ac88a4f51"]
14/07/30 16:15:10 INFO flow.Flow: []  sink: StdoutTap["SequenceFile[[UNKNOWN]->['?line']]"]["/tmp/temp81179071450888796715696517475394"]
14/07/30 16:15:10 INFO flow.Flow: []  parallel execution is enabled: false
14/07/30 16:15:10 INFO flow.Flow: []  starting jobs: 1
14/07/30 16:15:10 INFO flow.Flow: []  allocating threads: 1
14/07/30 16:15:10 INFO flow.FlowStep: [] starting step: (1/1) ...1450888796715696517475394
14/07/30 16:15:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/07/30 16:15:11 INFO flow.FlowStep: [] submitted hadoop job: job_local_0001
14/07/30 16:15:11 INFO flow.FlowStep: [] tracking url: http://localhost:8080/
14/07/30 16:15:11 INFO util.ProcessTree: setsid exited with exit code 0
14/07/30 16:15:12 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@fc824e
14/07/30 16:15:12 INFO mapred.MapTask: numReduceTasks: 0
14/07/30 16:15:12 INFO hadoop.FlowMapper: cascading version: 2.5.3
14/07/30 16:15:12 INFO hadoop.FlowMapper: child jvm opts: -Xmx200m
14/07/30 16:15:12 INFO hadoop.FlowMapper: sourcing from: MemorySourceTap["MemorySourceScheme[[UNKNOWN]->[ALL]]"]["/b7d6055f-3789-487d-946f-4b5ac88a4f51"]
14/07/30 16:15:12 INFO hadoop.FlowMapper: sinking to: StdoutTap["SequenceFile[[UNKNOWN]->['?line']]"]["/tmp/temp81179071450888796715696517475394"]
14/07/30 16:15:12 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
14/07/30 16:15:12 INFO mapred.LocalJobRunner:
14/07/30 16:15:12 INFO mapred.Task: Task attempt_local_0001_m_000000_0 is allowed to commit now
14/07/30 16:15:12 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0001_m_000000_0' to file:/tmp/temp81179071450888796715696517475394
14/07/30 16:15:12 INFO mapred.LocalJobRunner:
14/07/30 16:15:12 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
14/07/30 16:15:12 INFO mapred.FileInputFormat: Total input paths to process : 1


RESULTS
-----------------------
Four score and seven years ago our fathers brought forth on this continent a new nation
conceived in Liberty and dedicated to the proposition that all men are created equal
Now we are engaged in a great civil war testing whether that nation or any nation so
conceived and so dedicated can long endure We are met on a great battlefield of that war
We have come to dedicate a portion of that field as a final resting place for those who
here gave their lives that that nation might live It is altogether fitting and proper
that we should do this
But in a larger sense we can not dedicate  we can not consecrate  we can not hallow
this ground The brave men living and dead who struggled here have consecrated it
far above our poor power to add or detract The world will little note nor long remember
what we say here but it can never forget what they did here It is for us the living rather
to be dedicated here to the unfinished work which they who fought here have thus far so nobly
advanced It is rather for us to be here dedicated to the great task remaining before us
that from these honored dead we take increased devotion to that cause for which they gave
the last full measure of devotion  that we here highly resolve that these dead shall
not have died in vain  that this nation under God shall have a new birth of freedom
and that government of the people by the people for the people shall not perish
from the earth
-----------------------
14/07/30 16:15:12 INFO util.Hadoop18TapUtil: deleting temp path /tmp/temp81179071450888796715696517475394/_temporary
nil
user=>

user=> (?- (stdout) [19G [12G [20G
  #_=>     (<- [?word] [22G [16G [23G
  #_=>         (sentence :> ?line) [34G [16G [35G
  #_=>         (tokenise :< ?line :> ?word) [43G [16G [44G) [44G [45G) [45G [46G
14/07/30 16:15:14 INFO util.HadoopUtil: using default application jar, may cause class not found exceptions on the cluster
14/07/30 16:15:14 INFO planner.HadoopPlanner: using application jar: /home/XYZ/.m2/repository/cascading/cascading-hadoop/2.5.3/cascading-hadoop-2.5.3.jar
14/07/30 16:15:14 INFO flow.Flow: [] starting
14/07/30 16:15:14 INFO flow.Flow: []  source: MemorySourceTap["MemorySourceScheme[[UNKNOWN]->[ALL]]"]["/840fe467-c73b-4b4c-80bc-727a8ea52ce7"]
14/07/30 16:15:14 INFO flow.Flow: []  sink: StdoutTap["SequenceFile[[UNKNOWN]->['?word']]"]["/tmp/temp273045733006808956615702592648076"]
14/07/30 16:15:14 INFO flow.Flow: []  parallel execution is enabled: false
14/07/30 16:15:14 INFO flow.Flow: []  starting jobs: 1
14/07/30 16:15:14 INFO flow.Flow: []  allocating threads: 1
14/07/30 16:15:14 INFO flow.FlowStep: [] starting step: (1/1) ...3006808956615702592648076
14/07/30 16:15:14 INFO flow.FlowStep: [] submitted hadoop job: job_local_0002
14/07/30 16:15:14 INFO flow.FlowStep: [] tracking url: http://localhost:8080/
14/07/30 16:15:14 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1737be7
14/07/30 16:15:14 INFO mapred.MapTask: numReduceTasks: 0
14/07/30 16:15:15 INFO hadoop.FlowMapper: cascading version: 2.5.3
14/07/30 16:15:15 INFO hadoop.FlowMapper: child jvm opts: -Xmx200m
14/07/30 16:15:15 INFO hadoop.FlowMapper: sourcing from: MemorySourceTap["MemorySourceScheme[[UNKNOWN]->[ALL]]"]["/840fe467-c73b-4b4c-80bc-727a8ea52ce7"]
14/07/30 16:15:15 INFO hadoop.FlowMapper: sinking to: StdoutTap["SequenceFile[[UNKNOWN]->['?word']]"]["/tmp/temp273045733006808956615702592648076"]
14/07/30 16:15:15 INFO mapred.Task: Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting
14/07/30 16:15:15 INFO mapred.LocalJobRunner:
14/07/30 16:15:15 INFO mapred.Task: Task attempt_local_0002_m_000000_0 is allowed to commit now
14/07/30 16:15:15 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0002_m_000000_0' to file:/tmp/temp273045733006808956615702592648076
14/07/30 16:15:15 INFO mapred.LocalJobRunner:
14/07/30 16:15:15 INFO mapred.Task: Task 'attempt_local_0002_m_000000_0' done.
14/07/30 16:15:15 INFO mapred.FileInputFormat: Total input paths to process : 1


RESULTS
-----------------------
Four
score
and
seven
years
ago
our
fathers
brought
forth
on
this
continent
a
new
nation
conceived
in
Liberty
and
dedicated
to
the
proposition
that
all
men
are
created
equal
Now
we
are
engaged
in
a
great
civil
war
testing
whether
that
nation
or
any
nation
so
conceived
and
so
dedicated
can
long
endure
We
are
met
on
a
great
battlefield
of
that
war
We
have
come
to
dedicate
a
portion
of
that
field
as
a
final
resting
place
for
those
who
here
gave
their
lives
that
that
nation
might
live
It
is
altogether
fitting
and
proper
that
we
should
do
this
But
in
a
larger
sense
we
can
not
dedicate

we
can
not
consecrate

we
can
not
hallow
this
ground
The
brave
men
living
and
dead
who
struggled
here
have
consecrated
it
far
above
our
poor
power
to
add
or
detract
The
world
will
little
note
nor
long
remember
what
we
say
here
but
it
can
never
forget
what
they
did
here
It
is
for
us
the
living
rather
to
be
dedicated
here
to
the
unfinished
work
which
they
who
fought
here
have
thus
far
so
nobly
advanced
It
is
rather
for
us
to
be
here
dedicated
to
the
great
task
remaining
before
us
that
from
these
honored
dead
we
take
increased
devotion
to
that
cause
for
which
they
gave
the
last
full
measure
of
devotion

that
we
here
highly
resolve
that
these
dead
shall
not
have
died
in
vain

that
this
nation
under
God
shall
have
a
new
birth
of
freedom
and
that
government
of
the
people
by
the
people
for
the
people
shall
not
perish
from
the
earth
-----------------------
14/07/30 16:15:15 INFO util.Hadoop18TapUtil: deleting temp path /tmp/temp273045733006808956615702592648076/_temporary
nil
user=>

user=> (?- (stdout) [19G [12G [20G
  #_=>     (<- [?word ?count] [29G [16G [30G
  #_=>         (sentence :> ?line) [34G [16G [35G
  #_=>         (tokenise :< ?line :> ?word) [43G [16G [44G
  #_=>         (c/count :> ?count) [34G [16G [35G) [35G [36G) [36G [37G
14/07/30 16:15:17 INFO util.HadoopUtil: using default application jar, may cause class not found exceptions on the cluster
14/07/30 16:15:17 INFO planner.HadoopPlanner: using application jar: /home/XYZ/.m2/repository/cascading/cascading-hadoop/2.5.3/cascading-hadoop-2.5.3.jar
14/07/30 16:15:17 INFO flow.Flow: [] starting
14/07/30 16:15:17 INFO flow.Flow: []  source: MemorySourceTap["MemorySourceScheme[[UNKNOWN]->[ALL]]"]["/80b7d1fb-93bf-4947-9026-9ccb40f44326"]
14/07/30 16:15:17 INFO flow.Flow: []  sink: StdoutTap["SequenceFile[[UNKNOWN]->['?word', '?count']]"]["/tmp/temp128925885307038237815705521554123"]
14/07/30 16:15:17 INFO flow.Flow: []  parallel execution is enabled: false
14/07/30 16:15:17 INFO flow.Flow: []  starting jobs: 1
14/07/30 16:15:17 INFO flow.Flow: []  allocating threads: 1
14/07/30 16:15:17 INFO flow.FlowStep: [] starting step: (1/1) ...5307038237815705521554123
14/07/30 16:15:17 INFO flow.FlowStep: [] submitted hadoop job: job_local_0003
14/07/30 16:15:17 INFO flow.FlowStep: [] tracking url: http://localhost:8080/
14/07/30 16:15:17 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1726ac1
14/07/30 16:15:17 INFO mapred.MapTask: numReduceTasks: 1
14/07/30 16:15:17 INFO mapred.MapTask: io.sort.mb = 100
14/07/30 16:15:17 INFO mapred.MapTask: data buffer = 79691776/99614720
14/07/30 16:15:17 INFO mapred.MapTask: record buffer = 262144/327680
14/07/30 16:15:17 INFO hadoop.FlowMapper: cascading version: 2.5.3
14/07/30 16:15:17 INFO hadoop.FlowMapper: child jvm opts: -Xmx200m
14/07/30 16:15:18 INFO hadoop.FlowMapper: sourcing from: MemorySourceTap["MemorySourceScheme[[UNKNOWN]->[ALL]]"]["/80b7d1fb-93bf-4947-9026-9ccb40f44326"]
14/07/30 16:15:18 INFO hadoop.FlowMapper: sinking to: GroupBy(d56a05ef-4117-4b3c-9a92-70d8e8a341e0)[by:[{1}:'?word']]
14/07/30 16:15:18 INFO assembly.AggregateBy: using threshold value: 10000
14/07/30 16:15:18 INFO mapred.MapTask: Starting flush of map output
14/07/30 16:15:18 INFO mapred.MapTask: Finished spill 0
14/07/30 16:15:18 INFO mapred.Task: Task:attempt_local_0003_m_000000_0 is done. And is in the process of commiting
14/07/30 16:15:18 INFO mapred.LocalJobRunner:
14/07/30 16:15:18 INFO mapred.Task: Task 'attempt_local_0003_m_000000_0' done.
14/07/30 16:15:18 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1e6356d
14/07/30 16:15:18 INFO mapred.LocalJobRunner:
14/07/30 16:15:18 INFO mapred.Merger: Merging 1 sorted segments
14/07/30 16:15:18 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 2422 bytes
14/07/30 16:15:18 INFO mapred.LocalJobRunner:
14/07/30 16:15:18 INFO hadoop.FlowReducer: cascading version: 2.5.3
14/07/30 16:15:18 INFO hadoop.FlowReducer: child jvm opts: -Xmx200m
14/07/30 16:15:18 INFO hadoop.FlowReducer: sourcing from: GroupBy(d56a05ef-4117-4b3c-9a92-70d8e8a341e0)[by:[{1}:'?word']]
14/07/30 16:15:18 INFO hadoop.FlowReducer: sinking to: StdoutTap["SequenceFile[[UNKNOWN]->['?word', '?count']]"]["/tmp/temp128925885307038237815705521554123"]
14/07/30 16:15:18 INFO mapred.Task: Task:attempt_local_0003_r_000000_0 is done. And is in the process of commiting
14/07/30 16:15:18 INFO mapred.LocalJobRunner:
14/07/30 16:15:18 INFO mapred.Task: Task attempt_local_0003_r_000000_0 is allowed to commit now
14/07/30 16:15:18 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0003_r_000000_0' to file:/tmp/temp128925885307038237815705521554123
14/07/30 16:15:18 INFO mapred.LocalJobRunner: reduce > reduce
14/07/30 16:15:18 INFO mapred.Task: Task 'attempt_local_0003_r_000000_0' done.
14/07/30 16:15:18 INFO mapred.FileInputFormat: Total input paths to process : 1


RESULTS
-----------------------
    4
But    1
Four    1
God    1
It    3
Liberty    1
Now    1
The    2
We    2
a    7
above    1
add    1
advanced    1
ago    1
all    1
altogether    1
and    6
any    1
are    3
as    1
battlefield    1
be    2
before    1
birth    1
brave    1
brought    1
but    1
by    1
can    5
cause    1
civil    1
come    1
conceived    2
consecrate    1
consecrated    1
continent    1
created    1
dead    3
dedicate    2
dedicated    4
detract    1
devotion    2
did    1
died    1
do    1
earth    1
endure    1
engaged    1
equal    1
far    2
fathers    1
field    1
final    1
fitting    1
for    5
forget    1
forth    1
fought    1
freedom    1
from    2
full    1
gave    2
government    1
great    3
ground    1
hallow    1
have    5
here    8
highly    1
honored    1
in    4
increased    1
is    3
it    2
larger    1
last    1
little    1
live    1
lives    1
living    2
long    2
measure    1
men    2
met    1
might    1
nation    5
never    1
new    2
nobly    1
nor    1
not    5
note    1
of    5
on    2
or    2
our    2
people    3
perish    1
place    1
poor    1
portion    1
power    1
proper    1
proposition    1
rather    2
remaining    1
remember    1
resolve    1
resting    1
say    1
score    1
sense    1
seven    1
shall    3
should    1
so    3
struggled    1
take    1
task    1
testing    1
that    13
the    9
their    1
these    2
they    3
this    4
those    1
thus    1
to    8
under    1
unfinished    1
us    3
vain    1
war    2
we    8
what    2
whether    1
which    2
who    3
will    1
work    1
world    1
years    1
-----------------------
14/07/30 16:15:18 INFO util.Hadoop18TapUtil: deleting temp path /tmp/temp128925885307038237815705521554123/_temporary
nil
user=>

user=> (?- (stdout) [19G [12G [20G
  #_=>     (<- [?word ?count] [29G [16G [30G
  #_=>         (sentence :> ?line) [34G [16G [35G
  #_=>         (tokenise :< ?line :> ?word) [43G [16G [44G
  #_=>         (c/count :> ?count) [34G [16G [35G) [35G [36G) [36G [37G
14/07/30 16:15:20 INFO util.HadoopUtil: using default application jar, may cause class not found exceptions on the cluster
14/07/30 16:15:20 INFO planner.HadoopPlanner: using application jar: /home/XYZ/.m2/repository/cascading/cascading-hadoop/2.5.3/cascading-hadoop-2.5.3.jar
14/07/30 16:15:20 INFO flow.Flow: [] starting
14/07/30 16:15:20 INFO flow.Flow: []  source: MemorySourceTap["MemorySourceScheme[[UNKNOWN]->[ALL]]"]["/6d53a4d0-849d-40d6-a577-36f3cfa732d1"]
14/07/30 16:15:20 INFO flow.Flow: []  sink: StdoutTap["SequenceFile[[UNKNOWN]->['?word', '?count']]"]["/tmp/temp203622615933442583315708535973833"]
14/07/30 16:15:20 INFO flow.Flow: []  parallel execution is enabled: false
14/07/30 16:15:20 INFO flow.Flow: []  starting jobs: 1
14/07/30 16:15:20 INFO flow.Flow: []  allocating threads: 1
14/07/30 16:15:20 INFO flow.FlowStep: [] starting step: (1/1) ...5933442583315708535973833
14/07/30 16:15:20 INFO flow.FlowStep: [] submitted hadoop job: job_local_0004
14/07/30 16:15:20 INFO flow.FlowStep: [] tracking url: http://localhost:8080/
14/07/30 16:15:20 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@caf446
14/07/30 16:15:20 INFO mapred.MapTask: numReduceTasks: 1
14/07/30 16:15:20 INFO mapred.MapTask: io.sort.mb = 100
14/07/30 16:15:20 INFO mapred.MapTask: data buffer = 79691776/99614720
14/07/30 16:15:20 INFO mapred.MapTask: record buffer = 262144/327680
14/07/30 16:15:20 INFO hadoop.FlowMapper: cascading version: 2.5.3
14/07/30 16:15:20 INFO hadoop.FlowMapper: child jvm opts: -Xmx200m
14/07/30 16:15:21 INFO hadoop.FlowMapper: sourcing from: MemorySourceTap["MemorySourceScheme[[UNKNOWN]->[ALL]]"]["/6d53a4d0-849d-40d6-a577-36f3cfa732d1"]
14/07/30 16:15:21 INFO hadoop.FlowMapper: sinking to: GroupBy(c11a686a-603d-4108-9d34-3f3f04e9ca37)[by:[{1}:'?word']]
14/07/30 16:15:21 INFO assembly.AggregateBy: using threshold value: 10000
14/07/30 16:15:21 INFO mapred.MapTask: Starting flush of map output
14/07/30 16:15:21 INFO mapred.MapTask: Finished spill 0
14/07/30 16:15:21 INFO mapred.Task: Task:attempt_local_0004_m_000000_0 is done. And is in the process of commiting
14/07/30 16:15:21 INFO mapred.LocalJobRunner:
14/07/30 16:15:21 INFO mapred.Task: Task 'attempt_local_0004_m_000000_0' done.
14/07/30 16:15:21 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1c82eaa
14/07/30 16:15:21 INFO mapred.LocalJobRunner:
14/07/30 16:15:21 INFO mapred.Merger: Merging 1 sorted segments
14/07/30 16:15:21 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 2422 bytes
14/07/30 16:15:21 INFO mapred.LocalJobRunner:
14/07/30 16:15:21 INFO hadoop.FlowReducer: cascading version: 2.5.3
14/07/30 16:15:21 INFO hadoop.FlowReducer: child jvm opts: -Xmx200m
14/07/30 16:15:21 INFO hadoop.FlowReducer: sourcing from: GroupBy(c11a686a-603d-4108-9d34-3f3f04e9ca37)[by:[{1}:'?word']]
14/07/30 16:15:21 INFO hadoop.FlowReducer: sinking to: StdoutTap["SequenceFile[[UNKNOWN]->['?word', '?count']]"]["/tmp/temp203622615933442583315708535973833"]
14/07/30 16:15:21 INFO mapred.Task: Task:attempt_local_0004_r_000000_0 is done. And is in the process of commiting
14/07/30 16:15:21 INFO mapred.LocalJobRunner:
14/07/30 16:15:21 INFO mapred.Task: Task attempt_local_0004_r_000000_0 is allowed to commit now
14/07/30 16:15:21 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0004_r_000000_0' to file:/tmp/temp203622615933442583315708535973833
14/07/30 16:15:21 INFO mapred.LocalJobRunner: reduce > reduce
14/07/30 16:15:21 INFO mapred.Task: Task 'attempt_local_0004_r_000000_0' done.
14/07/30 16:15:21 INFO mapred.FileInputFormat: Total input paths to process : 1


RESULTS
-----------------------
    4
But    1
Four    1
God    1
It    3
Liberty    1
Now    1
The    2
We    2
a    7
above    1
add    1
advanced    1
ago    1
all    1
altogether    1
and    6
any    1
are    3
as    1
battlefield    1
be    2
before    1
birth    1
brave    1
brought    1
but    1
by    1
can    5
cause    1
civil    1
come    1
conceived    2
consecrate    1
consecrated    1
continent    1
created    1
dead    3
dedicate    2
dedicated    4
detract    1
devotion    2
did    1
died    1
do    1
earth    1
endure    1
engaged    1
equal    1
far    2
fathers    1
field    1
final    1
fitting    1
for    5
forget    1
forth    1
fought    1
freedom    1
from    2
full    1
gave    2
government    1
great    3
ground    1
hallow    1
have    5
here    8
highly    1
honored    1
in    4
increased    1
is    3
it    2
larger    1
last    1
little    1
live    1
lives    1
living    2
long    2
measure    1
men    2
met    1
might    1
nation    5
never    1
new    2
nobly    1
nor    1
not    5
note    1
of    5
on    2
or    2
our    2
people    3
perish    1
place    1
poor    1
portion    1
power    1
proper    1
proposition    1
rather    2
remaining    1
remember    1
resolve    1
resting    1
say    1
score    1
sense    1
seven    1
shall    3
should    1
so    3
struggled    1
take    1
task    1
testing    1
that    13
the    9
their    1
these    2
they    3
this    4
those    1
thus    1
to    8
under    1
unfinished    1
us    3
vain    1
war    2
we    8
what    2
whether    1
which    2
who    3
will    1
work    1
world    1
years    1
-----------------------
14/07/30 16:15:21 INFO util.Hadoop18TapUtil: deleting temp path /tmp/temp203622615933442583315708535973833/_temporary
nil
user=> 14/07/30 16:15:41 INFO util.Update: newer Cascading release available: 2.5.5


user=> (quit) [13G [8G [14G
Bye for now!
]0;XYZ@XYZ-VirtualBox: ~/clojure/cascalog XYZ@XYZ-VirtualBox:~/clojure/cascalog$ exit
exit

Script done on Wednesday 30 July 2014 04:29:33 PM IST

Comments