<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-5950739624271007232</id><updated>2011-07-08T02:32:26.735-07:00</updated><category term='alias'/><category term='object system'/><category term='path'/><category term='erlang'/><category term='absolute'/><category term='functional'/><category term='arc'/><category term='history'/><category term='perl'/><category term='performance'/><category term='article'/><category term='lisp'/><category term='language'/><category term='benchmark'/><category term='multi-core'/><category term='bash'/><category term='utility'/><category term='hipe'/><category term='Moose'/><title type='text'>Pichi's blog</title><subtitle type='html'>One size doesn't fit all.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://pichis-blog.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5950739624271007232/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://pichis-blog.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Pichi</name><uri>http://www.blogger.com/profile/12662180723203160349</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>17</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-5950739624271007232.post-4495072779053920849</id><published>2011-05-29T05:06:00.000-07:00</published><updated>2011-05-29T05:20:23.319-07:00</updated><title type='text'>Are Perl and Erlang functional enough for Nemo?</title><content type='html'>I have read interesting &lt;a href="http://stackoverflow.com/questions/6166155/is-scala-a-functional-programming-language/6166222#6166222"&gt;litmus test for a functional language&lt;/a&gt; from &lt;a href="http://stackoverflow.com"&gt;stackoverflow&lt;/a&gt; user &lt;a href="http://stackoverflow.com/users/768469/nemo"&gt;Nemo&lt;/a&gt;. Just for curiosity there is mine try for perl:&lt;pre&gt;
sub Inc{1+shift}
sub Thrice {
  my $f=shift;
  sub {
    $f-&amp;gt;($f-&amp;gt;($f-&amp;gt;(shift)))
  }
}
$\="\n";
  print
for Thrice(\&amp;amp;Inc)-&amp;gt;(0),
    Thrice(Thrice(\&amp;amp;Inc))-&amp;gt;(0),
    Thrice(\&amp;amp;Thrice)-&amp;gt;(\&amp;amp;Inc)-&amp;gt;(0)
&lt;/pre&gt;
and Erlang:&lt;pre&gt;1&amp;gt; Thrice = fun(F) -&amp;gt; fun(X) -&amp;gt; F(F(F(X))) end end.
#Fun&lt;erl_eval.6.13229925&gt;&lt;/erl_eval.6.13229925&gt;
2&amp;gt; Inc = fun(X) -&amp;gt; 1+X end.
#Fun&lt;erl_eval.6.13229925&gt;&lt;/erl_eval.6.13229925&gt;
3&amp;gt; (Thrice(Inc))(0).
3
4&amp;gt; (Thrice(Thrice(Inc)))(0).
9
5&amp;gt; ((Thrice(Thrice))(Inc))(0).
27&lt;/pre&gt;
Both seems functional enough for mine personal taste.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5950739624271007232-4495072779053920849?l=pichis-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pichis-blog.blogspot.com/feeds/4495072779053920849/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5950739624271007232&amp;postID=4495072779053920849' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5950739624271007232/posts/default/4495072779053920849'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5950739624271007232/posts/default/4495072779053920849'/><link rel='alternate' type='text/html' href='http://pichis-blog.blogspot.com/2011/05/are-perl-and-erlang-functional-enough.html' title='Are Perl and Erlang functional enough for Nemo?'/><author><name>Pichi</name><uri>http://www.blogger.com/profile/12662180723203160349</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5950739624271007232.post-8188382934091244661</id><published>2009-10-06T06:36:00.000-07:00</published><updated>2009-10-06T07:09:04.581-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='performance'/><category scheme='http://www.blogger.com/atom/ns#' term='object system'/><category scheme='http://www.blogger.com/atom/ns#' term='functional'/><category scheme='http://www.blogger.com/atom/ns#' term='perl'/><title type='text'>Closure vs. Object in Perl</title><content type='html'>&lt;p&gt;I was curios how closures performs in comparison with objects in Perl. I have tested it on counter as the simplest possible abstraction with state.&lt;/p&gt;
&lt;pre&gt;
use Benchmark qw(cmpthese);

sub make_counter {
    my $counter = shift;
    return sub { $counter++ };
}

package counter;

sub new {
    my ( $class, $counter ) = @_;
    return bless \$counter, $class;
}

sub inc { ${ shift() }++ }

package main;

our $inc = make_counter(1);

our $counter = counter-&gt;new(1);

cmpthese(
    -5,
    {   closure_make =&gt; q{make_counter(1)},
        object_make  =&gt; q{counter-&gt;new(1)}
    }
);

cmpthese(
    -5,
    {   closure =&gt; q{$main::inc-&gt;()},
        method  =&gt; q{$main::counter-&gt;inc()}
    }
);
&lt;/pre&gt;
&lt;p&gt;And results are:&lt;/p&gt;
&lt;pre&gt;
                 Rate closure_make  object_make
closure_make 420045/s           --         -35%
object_make  643969/s          53%           --
             Rate  method closure
method  2397172/s      --    -35%
closure 3697681/s     54%      --
&lt;/pre&gt;
&lt;p&gt;Well, it is not simple. Both approaches have their benefits but if you want best performance and want little bit abstraction you should pass closure into your hot loop instead objects. But if you want create closure inside loop, than let you try avoid it or change to object. But you should not need it in hot loop anyway. I'm little bit surprised that closure construction is so expensive in perl. I have known that bless is not cheap operation, but closure?&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5950739624271007232-8188382934091244661?l=pichis-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pichis-blog.blogspot.com/feeds/8188382934091244661/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5950739624271007232&amp;postID=8188382934091244661' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5950739624271007232/posts/default/8188382934091244661'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5950739624271007232/posts/default/8188382934091244661'/><link rel='alternate' type='text/html' href='http://pichis-blog.blogspot.com/2009/10/closure-vs-object-in-perl.html' title='Closure vs. Object in Perl'/><author><name>Pichi</name><uri>http://www.blogger.com/profile/12662180723203160349</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5950739624271007232.post-6953759475145304444</id><published>2009-02-15T09:22:00.000-08:00</published><updated>2009-02-16T01:13:54.072-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='performance'/><category scheme='http://www.blogger.com/atom/ns#' term='object system'/><category scheme='http://www.blogger.com/atom/ns#' term='perl'/><category scheme='http://www.blogger.com/atom/ns#' term='Moose'/><category scheme='http://www.blogger.com/atom/ns#' term='benchmark'/><title type='text'>How fast or slow is Moose?</title><content type='html'>&lt;p&gt;&lt;a href="http://www.iinteractive.com/moose/"&gt;Moose&lt;/a&gt; is &lt;i&gt;a postmodern object system for Perl 5 that takes the tedium out of writing object-oriented Perl. It borrows all the best features from Perl 6, CLOS (LISP), Smalltalk, Java, BETA, OCaml, Ruby and more, while still keeping true to its Perl 5 roots.&lt;/i&gt; It is very powerful and I was curious how fast is in current state. I have read that Moose is slow but all articles what I have found is about two years old. For me is important mostly runtime speed but compile time. So my benchmark is focused only to runtime. I also hate getter/setter combined accessors thus my tests are also only about separated getter and setter. Benchmark code follows.&lt;/p&gt;
&lt;pre&gt;#!/usr/bin/env perl

use strict;
use warnings;
use Data::Dumper;
use Benchmark qw(:all :hireswallclock);

{

 package MooseClassMutable;
 use Moose;

 has var =&gt; (
  is       =&gt; 'ro',
  reader   =&gt; 'get_var',
  writer   =&gt; 'set_var',
  required =&gt; 1
 );

}
{

 package MooseClassImmutable;
 use Moose;

 has var =&gt; (
  is       =&gt; 'ro',
  reader   =&gt; 'get_var',
  writer   =&gt; 'set_var',
  required =&gt; 1
 );

 no Moose;
 __PACKAGE__-&gt;meta-&gt;make_immutable;
}
{

 package PerlClass;

 sub new {
  my ( $class, %args ) = @_;
  die 'var value must be set' unless exists $args{var};
  return bless \%args, $class;
 }

 sub get_var { shift()-&gt;{var} }

 sub set_var { $_[0]-&gt;{var} = $_[1] }
}

{

 package MooseClassFast;
 use Moose;
 with 'MooseX::Emulate::Class::Accessor::Fast';

 has var =&gt; (
  is       =&gt; 'ro',
  required =&gt; 1
 );

 __PACKAGE__-&gt;follow_best_practice;
 __PACKAGE__-&gt;mk_accessors('var');

 no Moose;
 __PACKAGE__-&gt;meta-&gt;make_immutable;
}

cmpthese(
 -5,
 {   map {
   my $class = $_;
   "new $class" =&gt; sub { $class-&gt;new( var =&gt; 1 ) for 1 .. 1000 }
   } qw(MooseClassMutable MooseClassImmutable PerlClass MooseClassFast)
 }
);
my %objs = ( map { $_ =&gt; $_-&gt;new( var =&gt; 1 ) }
  qw(MooseClassMutable MooseClassImmutable PerlClass MooseClassFast) );
cmpthese(
 -5,
 {   map {
   my $class = $_;
   my $obj   = $objs{$class};
   "get $class" =&gt; sub { $obj-&gt;get_var() for 1 .. 1000 }
   } qw(MooseClassMutable MooseClassImmutable PerlClass MooseClassFast)
 }
);
cmpthese(
 -5,
 {   map {
   my $class = $_;
   my $obj   = $objs{$class};
   "set $class" =&gt; sub { $obj-&gt;set_var(1) for 1 .. 1000 }
   } qw(MooseClassMutable MooseClassImmutable PerlClass MooseClassFast)
 }
);&lt;/pre&gt;&lt;p&gt;I have tested Moose versions 0.54 and 0.68 and just for curiosity also &lt;code&gt;Class::Accessor::Fast&lt;/code&gt; emulation which works only with Moose version 0.68. Notice that rate is measured in thousands. Moose 0.54 results comes first.&lt;/p&gt;
&lt;pre&gt;                          Rate new MooseClassMutable new MooseClassImmutable new PerlClass
new MooseClassMutable   6.85/s                    --                    -96%          -98%
new MooseClassImmutable  192/s                 2697%                      --          -53%
new PerlClass            403/s                 5790%                    111%            --
                          Rate get MooseClassMutable get MooseClassImmutable get PerlClass
get MooseClassMutable   1716/s                    --                     -2%          -25%
get MooseClassImmutable 1754/s                    2%                      --          -23%
get PerlClass           2273/s                   32%                     30%            --
                          Rate set MooseClassImmutable set MooseClassMutable set PerlClass
set MooseClassImmutable 1611/s                      --                   -2%          -16%
set MooseClassMutable   1643/s                      2%                    --          -14%
set PerlClass           1916/s                     19%                   17%            --
&lt;/pre&gt;&lt;p&gt;Moose version 0.68 follows.&lt;/p&gt;
&lt;pre&gt;                          Rate new MooseClassMutable new MooseClassFast new MooseClassImmutable new PerlClass
new MooseClassMutable   15.4/s                    --               -84%                    -92%          -96%
new MooseClassFast      98.7/s                  541%                 --                    -48%          -76%
new MooseClassImmutable  190/s                 1138%                93%                      --          -54%
new PerlClass            412/s                 2579%               318%                    116%            --
                          Rate get MooseClassFast get MooseClassImmutable get MooseClassMutable get PerlClass
get MooseClassFast      1716/s                 --                     -2%                   -2%          -24%
get MooseClassImmutable 1743/s                 2%                      --                   -1%          -23%
get MooseClassMutable   1754/s                 2%                      1%                    --          -22%
get PerlClass           2261/s                32%                     30%                   29%            --
                          Rate set MooseClassFast set MooseClassMutable set MooseClassImmutable set PerlClass
set MooseClassFast      78.6/s                 --                  -95%                    -95%          -96%
set MooseClassMutable   1659/s              2011%                    --                     -1%          -15%
set MooseClassImmutable 1680/s              2038%                    1%                      --          -14%
set PerlClass           1950/s              2381%                   18%                     16%            --
&lt;/pre&gt;&lt;p&gt;Moose seems fast enough for me. If I realize how powerful Moose is results are great. I can persist class definition using &lt;code&gt;make_immutable&lt;/code&gt; in most of cases and 190 thousand object constructions per second is enough. There is also big improvement in mutable version between 0.54 and 0.68 and 15 thousand per second is not terrible. Moose accessors are really fast and &lt;code&gt;make_immutable&lt;/code&gt; have not any impact here. 1.7 million reads and 1.6 million writes per second is enough and my ugly handcrafted accessors can't make big difference here (2.2Mr/s and 1.9Mw/s). There is strange Class::Accessor:Fast setter result and I'm curious why. Anyway Moose itself performs well and there is not reason using it.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5950739624271007232-6953759475145304444?l=pichis-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pichis-blog.blogspot.com/feeds/6953759475145304444/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5950739624271007232&amp;postID=6953759475145304444' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5950739624271007232/posts/default/6953759475145304444'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5950739624271007232/posts/default/6953759475145304444'/><link rel='alternate' type='text/html' href='http://pichis-blog.blogspot.com/2009/02/how-fast-or-slow-is-moose.html' title='How fast or slow is Moose?'/><author><name>Pichi</name><uri>http://www.blogger.com/profile/12662180723203160349</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5950739624271007232.post-9178409651539561606</id><published>2009-01-28T02:49:00.000-08:00</published><updated>2009-02-04T01:09:20.418-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='bash'/><category scheme='http://www.blogger.com/atom/ns#' term='alias'/><category scheme='http://www.blogger.com/atom/ns#' term='utility'/><category scheme='http://www.blogger.com/atom/ns#' term='absolute'/><category scheme='http://www.blogger.com/atom/ns#' term='path'/><title type='text'>Absolute path resolver</title><content type='html'>I was looked for some utility to resolve absolute path of file or directory. I found &lt;code&gt;namei
&lt;/code&gt; but it result returns in form&lt;pre&gt;
$ namei /etc/gdc
f: /etc/gdc
d /
d etc
l gdc -&gt; /home/hynek/.gdc
d /
d home
d hynek
l .gdc -&gt; work/tiger/bear/common/etc/gdc
  d work
  d tiger
  l bear -&gt; bear.trunk/
    d bear.trunk
  d common
  d etc
  d gdc&lt;/pre&gt;but I suspected something more like&lt;pre&gt;
$ abs_path /etc/gdc
/home/hynek/work/tiger/bear.trunk/common/etc/gdc
&lt;/pre&gt;I haven't found anything better than make alias in my &lt;code&gt;~/.bashrc&lt;/code&gt;&lt;pre&gt;
alias abs_path='perl -MCwd -le'\''print Cwd::abs_path($_) foreach @ARGV'\'&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Edit:&lt;/strong&gt; As ZD notice &lt;code&gt;readlink -f&lt;/code&gt; does same work. Thanks. &lt;code&gt;readlink -m&lt;/code&gt; and &lt;code&gt;readlink -e&lt;/code&gt; works as well but differs if some parts missing. I have decided to use &lt;/p&gt;&lt;pre&gt;alias abs_path='readlink -m'&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5950739624271007232-9178409651539561606?l=pichis-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pichis-blog.blogspot.com/feeds/9178409651539561606/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5950739624271007232&amp;postID=9178409651539561606' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5950739624271007232/posts/default/9178409651539561606'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5950739624271007232/posts/default/9178409651539561606'/><link rel='alternate' type='text/html' href='http://pichis-blog.blogspot.com/2009/01/absolute-path-resolver.html' title='Absolute path resolver'/><author><name>Pichi</name><uri>http://www.blogger.com/profile/12662180723203160349</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5950739624271007232.post-1616827181540791280</id><published>2009-01-14T05:22:00.000-08:00</published><updated>2009-01-15T00:42:23.860-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='erlang'/><category scheme='http://www.blogger.com/atom/ns#' term='history'/><title type='text'>Erlang history summary by J. Armstrong</title><content type='html'>&lt;p&gt;&lt;a href="http://www.sics.se/~joe/"&gt;Joe Armstrong&lt;/a&gt; posts nice &lt;a href="http://www.erlang.org/pipermail/erlang-questions/2009-January/041016.html"&gt;summary&lt;/a&gt; of Erlang history.&lt;/p&gt;&lt;blockquote&gt;&lt;p&gt;The transition was easy - they paid to do this. It became a real shipping project when they decided to use Erlang for the AXD301 - at that stage they put in the necessary $$$'s.&lt;/p&gt; &lt;p&gt;Now why did they choose Erlang for this project? - because all other alternatives had failed - ie it was not the strength of Erlang that was the deciding factor - rather the non-existence of alternatives.&lt;/p&gt; &lt;p&gt;Now how come the Erlang stuff was developed in the first place?&lt;/p&gt; &lt;p&gt;This was a happy accident - In the early 1980's a computer science lab was formed - most of the guys in the newly formed lab had zero experience with technology transfer, so we all thought that all we had to do was "invert stuff" and then "sell the idea to the management" nobody told us that this was like permanently banging your hand against a brick wall.&lt;/p&gt; &lt;p&gt;Inventing stuff is the easy bit ...&lt;/p&gt; &lt;p&gt;The selling stuff was tricky - we were very bad at this but very optimistic (still am :-) - we made all the classic mistakes - insulting people - getting into technical wars -&lt;/p&gt; &lt;p&gt;The turning point came when Erlang was banned - at the time we were very pissed off but like most carefull considered management decsions the net result was the exact opposite of what was planned - the consequences of the ban were difficult to forsee - but chaos was created - so things changed rapidly.&lt;/p&gt; &lt;p&gt;Thinking back the *important* things were: &lt;ul&gt;&lt;li&gt;enthusiasm and optimism (believe in what you do)&lt;/li&gt;&lt;li&gt;serendipity&lt;/li&gt;&lt;li&gt;chaos&lt;/li&gt;&lt;li&gt;smart people&lt;/li&gt;&lt;li&gt;finance&lt;/li&gt;&lt;/ul&gt;&lt;/p&gt; &lt;p&gt;I think we systematically under-rate the significance of chance and chaos.  Most significant change takes place in very short time periods of chaos. Erlang had many periods when nothing happened for years then rapid changes could take place in very short time periods, always when a crisis occurred (ie Erlang was banned, a big project failed etc).&lt;/p&gt; &lt;p&gt;Moral - forget about careful planning and move quickly when a crisis occurs - trust your gut feelings.&lt;/p&gt; &lt;p&gt;Cheers&lt;/p&gt; &lt;p&gt;/Joe Armstrong&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5950739624271007232-1616827181540791280?l=pichis-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pichis-blog.blogspot.com/feeds/1616827181540791280/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5950739624271007232&amp;postID=1616827181540791280' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5950739624271007232/posts/default/1616827181540791280'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5950739624271007232/posts/default/1616827181540791280'/><link rel='alternate' type='text/html' href='http://pichis-blog.blogspot.com/2009/01/erlang-history-summary-by-j-armstrong.html' title='Erlang history summary by J. Armstrong'/><author><name>Pichi</name><uri>http://www.blogger.com/profile/12662180723203160349</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5950739624271007232.post-2478254696091187279</id><published>2009-01-13T07:19:00.000-08:00</published><updated>2009-01-14T05:38:05.996-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='erlang'/><category scheme='http://www.blogger.com/atom/ns#' term='history'/><category scheme='http://www.blogger.com/atom/ns#' term='article'/><title type='text'>History of Erlang</title><content type='html'>&lt;a href="http://www.sics.se/~joe/"&gt;Joe Armstrong&lt;/a&gt; wrote nice article about &lt;a href="http://www.cs.chalmers.se/Cs/Grundutb/Kurser/ppxt/HT2007/general/languages/armstrong-erlang_history.pdf"&gt;Erlang History&lt;/a&gt;. It's worth reading whole article but I found most funny and nice part it's end. &lt;blockquote&gt;&lt;h4&gt;6.4 Finally&lt;/h4&gt; &lt;p&gt;It is perhaps interesting to note that the two most significant factors that led to the spread of Erlang were:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;The collapse of the AXE-N project.&lt;/li&gt; &lt;li&gt;The Erlang ban.&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;Both of these factors were outside our control and were unplanned. These factors were far more significant than all the things we did plan for and were within our control. We were fortuitously able to take advantage of the collapse of the AXE-N project by rushing in when the project failed. That we were able to do so was more a matter of luck than planning. Had the collapse occurred at a different site then this would not have happened. We were able to step in only because the collapse of the project happened in the building where we worked so we knew all about it. Eventually Ericsson did the right thing (using the right technology for the job) for the wrong reasons (competing technologies failed). One day I hope they will do the right things for the right reasons.&lt;/p&gt;&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5950739624271007232-2478254696091187279?l=pichis-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pichis-blog.blogspot.com/feeds/2478254696091187279/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5950739624271007232&amp;postID=2478254696091187279' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5950739624271007232/posts/default/2478254696091187279'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5950739624271007232/posts/default/2478254696091187279'/><link rel='alternate' type='text/html' href='http://pichis-blog.blogspot.com/2009/01/history-of-erlang.html' title='History of Erlang'/><author><name>Pichi</name><uri>http://www.blogger.com/profile/12662180723203160349</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5950739624271007232.post-5445907093081613228</id><published>2008-12-17T03:00:00.000-08:00</published><updated>2008-12-17T03:29:15.173-08:00</updated><title type='text'>My interests</title><content type='html'>&lt;p&gt;&lt;a href="http://www.met.cz/"&gt;Matin Hassman&lt;/a&gt; in his &lt;a href="http://met.blog.root.cz/"&gt;blog&lt;/a&gt; &lt;a href="http://met.blog.root.cz/2008/12/16/slova-slova-slova-nejen-v-mraku-danskem/"&gt;post&lt;/a&gt; wrote about the &lt;a href="http://www.wordle.net/"&gt;Wordle&lt;/a&gt;. This is nice toy tool to for generating “word clouds” from text. I have generated two from my &lt;a href="http://delicious.com/HynekPichi"&gt;bookmarkas&lt;/a&gt; and &lt;a href="http://www.google.com/reader/shared/09422360554241869353"&gt;shared items&lt;/a&gt; in google reader.&lt;/p&gt;
&lt;h3&gt;Bookmarks&lt;/h3&gt;
&lt;p&gt;&lt;a href="http://www.wordle.net/gallery/wrdl/392526/Bookmarks" title="Wordle: Bookmarks"&gt;&lt;img src="http://www.wordle.net/thumb/wrdl/392526/Bookmarks" style="border: 1px solid rgb(221, 221, 221); padding: 4px;" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;Shared Items&lt;/h3&gt;
&lt;p&gt;&lt;a href="http://www.wordle.net/gallery/wrdl/392565/shared_items" title="Wordle: shared items"&gt;&lt;img src="http://www.wordle.net/thumb/wrdl/392565/shared_items" style="border: 1px solid rgb(221, 221, 221); padding: 4px;" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5950739624271007232-5445907093081613228?l=pichis-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pichis-blog.blogspot.com/feeds/5445907093081613228/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5950739624271007232&amp;postID=5445907093081613228' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5950739624271007232/posts/default/5445907093081613228'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5950739624271007232/posts/default/5445907093081613228'/><link rel='alternate' type='text/html' href='http://pichis-blog.blogspot.com/2008/12/my-interests.html' title='My interests'/><author><name>Pichi</name><uri>http://www.blogger.com/profile/12662180723203160349</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5950739624271007232.post-548660326747491046</id><published>2008-09-03T09:01:00.000-07:00</published><updated>2008-11-11T06:46:29.697-08:00</updated><title type='text'>Beust Challenge in Erlang</title><content type='html'>There was a &lt;a href="http://beust.com/weblog/archives/000491.html"&gt;challenge     posted by Cedric Beust&lt;/a&gt;. I tried reuse my older solution of permutation generator and aplly this aproach on this issue. Here is result:
&lt;pre&gt;
-module(cbchallenge).

-export([test/1]).

combine(Sufix, 0, Acc, _L) -&gt;
   accept(list_to_integer(lists:reverse(Sufix)), Acc);
combine(Sufix, N, Acc, L) -&gt;
   combine(Sufix, N, Acc, L, []).

combine(_Sufix, _N, Acc, [], _D) -&gt; Acc;
combine(Sufix, N, Acc, [X | T], D) -&gt;
   combine(Sufix, N,
     _NewAcc = combine([X | Sufix], N - 1, Acc,
         lists:reverse(D, T)),
     T, [X | D]).

accept(X, {Count, undefined}) -&gt; {Count + 1, {X, 0}};
accept(X, {Count, {Last, MaxDistance}})
   when X - Last &gt; MaxDistance -&gt;
   {Count + 1, {X, X - Last}};
accept(X, {Count, {_, MaxDistance}}) -&gt;
   {Count + 1, {X, MaxDistance}}.

count(Log10) -&gt;
   {Count, {_, MaxDistance}} = combine([], Log10,
     {0, undefined}, lists:seq($1, $9),
     [$0]),
   {Count, MaxDistance}.

test(MaxLog10) -&gt;
   lists:foldl(fun (Log10, AccIn) -&gt;
   collectResult(count(Log10), AccIn)
  end,
  {0, 0}, lists:seq(1, MaxLog10)).

collectResult({Count, Max}, {OldCount, OldMax})
   when Max &gt; OldMax -&gt;
   {OldCount + Count, Max};
collectResult({Count, _}, {OldCount, Max}) -&gt;
   {OldCount + Count, Max}.&lt;/pre&gt;

It is more than twice faster than &lt;a href="http://kevin.scaldeferri.com/blog/"&gt;Kevin's&lt;/a&gt; &lt;a href="http://kevin.scaldeferri.com/dist/fastbeust.erl"&gt;crazybob&lt;/a&gt; solution on my laptop (kevin ~18s, mine ~8.5s).
&lt;pre&gt;&gt; [{X, timer:tc(cbchallenge, test, [X])} || X&lt;-lists:seq(1,10)].
[{1,{16,{9,1}}},
 {2,{52,{90,2}}},
 {3,{420,{738,11}}},
 {4,{3073,{5274,105}}},
 {5,{20019,{32490,1047}}},
 {6,{105757,{168570,10469}}},
 {7,{459955,{712890,104691}}},
 {8,{1587747,{2345850,1046913}}},
 {9,{4522205,{5611770,10469135}}},
 {10,{8563146,{8877690,104691357}}}]
&lt;/pre&gt;I guess, there is much more faster solution closer to original &lt;a href="http://crazybob.org/BeustSequence.java.html"&gt;crazybob's solution&lt;/a&gt; which is arithmetical and applicable just only to numbers. My approach is applicable to any non repeated combination of members of any set, but slower.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5950739624271007232-548660326747491046?l=pichis-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pichis-blog.blogspot.com/feeds/548660326747491046/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5950739624271007232&amp;postID=548660326747491046' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5950739624271007232/posts/default/548660326747491046'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5950739624271007232/posts/default/548660326747491046'/><link rel='alternate' type='text/html' href='http://pichis-blog.blogspot.com/2008/09/beust-challenge-in-erlang.html' title='Beust Challenge in Erlang'/><author><name>Pichi</name><uri>http://www.blogger.com/profile/12662180723203160349</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5950739624271007232.post-695602036610574005</id><published>2008-02-01T02:54:00.000-08:00</published><updated>2008-02-01T04:04:22.758-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='lisp'/><category scheme='http://www.blogger.com/atom/ns#' term='functional'/><category scheme='http://www.blogger.com/atom/ns#' term='arc'/><category scheme='http://www.blogger.com/atom/ns#' term='language'/><title type='text'>Arc - mostly macros and syntactic sugar</title><content type='html'>&lt;a href="http://www.innoq.com/blog/st/2008/01/arc-a-new-programming-language.html"&gt;Stefan Tilkov&lt;/a&gt; said about &lt;a target="_blank" href="http://arclanguage.org/"&gt;Arc&lt;/a&gt;:
&lt;blockquote&gt;After a quick glance at the &lt;a target="_blank" href="http://ycombinator.com/arc/tut.txt"&gt;tutorial&lt;/a&gt;, the most intriguing bit seems to be the support for macros, which work (almost) like function definitions. Interesting, but nothing that gets me overly excited.
&lt;/blockquote&gt;
I have same experinece. I read &lt;a target="_blank" href="http://ycombinator.com/arc/tut.txt"&gt;tutorial&lt;/a&gt; two days ago and I think it is mostly only scheme with macros and syntactic sugar. I am not so much familiar with lisp and scheme, but I think there is nothing what can't be done almost same simply in scheme or other lisp dialects.

Updated:
&lt;a href="http://www.dekorte.com/"&gt;&lt;span class="menus"&gt;&lt;span class="selectedMenu"&gt;Steve dekorte&lt;/span&gt;&lt;/span&gt;&lt;/a&gt; think similar:  &lt;span class="blogEntryText"&gt;&lt;blockquote&gt; My own impression of Arc is that it's not significantly different from Scheme.&lt;/blockquote&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5950739624271007232-695602036610574005?l=pichis-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pichis-blog.blogspot.com/feeds/695602036610574005/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5950739624271007232&amp;postID=695602036610574005' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5950739624271007232/posts/default/695602036610574005'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5950739624271007232/posts/default/695602036610574005'/><link rel='alternate' type='text/html' href='http://pichis-blog.blogspot.com/2008/02/arc-mostly-macros-and-syntactic-sugar.html' title='Arc - mostly macros and syntactic sugar'/><author><name>Pichi</name><uri>http://www.blogger.com/profile/12662180723203160349</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5950739624271007232.post-7514254375681416446</id><published>2007-11-02T02:42:00.000-07:00</published><updated>2007-11-02T08:50:24.794-07:00</updated><title type='text'>How much cores are using WF solutions</title><content type='html'>Tim Bray published &lt;a href="http://www.tbray.org/ongoing/When/200x/2007/10/30/WF-Results"&gt;WF XI: Results&lt;/a&gt; and I would like to know how much of all these CPU cores each solution uses. Than I compute this table:
&lt;pre&gt;Name            Language    Elapsed     User        System  Parallel CPU work
-----------------------------------------------------------------------------
clv5            Gawk        46.73       40.63       6.1         1
tbray5          Erlang      01:04.32    35:33.35    00:45.84    33.88
wfinder1_1      Erlang      6.46        34.07       8.02        6.52
report-counts   Ruby        01:43.71    01:27.11    00:16.60    1
?               Groovy      02:21.83    02:22.97    00:19.95    1.15
wf_p            Ruby        50.16       37.58       12.5        1
wf-2            Python      41.04       34.8        6.24        1
wf-6(2)         Python      16.91       3.62        1.86        0.32
wf-6(4)         Python      9.08        3.66        1.89        0.61
wf-6(8)         Python      5.81        *           *           *
wf-6(16)        Python      4.38        *           *           *
wf              OCaml       49.69       41.94       7.75        1
widefinder      PHP         01:29.81    01:23.10    00:06.71    1
wf_pichi3       Erlang      8.28        51.98       9.38        7.41
tbray5          Erlang      00:20.74    03:51.33    00:08:00    34.3&lt;/pre&gt;
Nice, that erlang implementations can use cores well, but in this task is not so much good generally. Erlang manages parallel processes well, but those processes can be better written in other languages and used as ports. Especially when this task is string operations on big amount of data.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5950739624271007232-7514254375681416446?l=pichis-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pichis-blog.blogspot.com/feeds/7514254375681416446/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5950739624271007232&amp;postID=7514254375681416446' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5950739624271007232/posts/default/7514254375681416446'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5950739624271007232/posts/default/7514254375681416446'/><link rel='alternate' type='text/html' href='http://pichis-blog.blogspot.com/2007/11/how-much-cores-are-using-wf-solutions.html' title='How much cores are using WF solutions'/><author><name>Pichi</name><uri>http://www.blogger.com/profile/12662180723203160349</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5950739624271007232.post-1063696450863537442</id><published>2007-10-28T10:52:00.000-07:00</published><updated>2007-11-01T03:52:38.451-07:00</updated><title type='text'>Faster than ruby but scalable</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Update 2007-11-01:&lt;/span&gt; Correction of typo in wf_pichi3.erl.&lt;pre&gt;@@ -14,7 +14,7 @@

 -compile([native]).

-main([File]) -&amp;gt; start(File), halt().
+main([File]) -&amp;gt; start_bmets(File), halt().

 start_bmets(FileName) -&amp;gt;
     {ok, F} = nlt_reader:open(FileName),
&lt;/pre&gt;
I worked on  &lt;a target="_blank" href="http://www.tbray.org/ongoing/When/200x/2007/09/20/Wide-Finder"&gt;Wide Finder Project&lt;/a&gt; again. But what happen? I improved Anders Nygren's &lt;a href="http://www.erlang.org/pipermail/erlang-questions/2007-October/030245.html"&gt;code&lt;/a&gt; with suggestion from my previous blog and also big suggestion from Caoyuan's &lt;a href="http://blogtrader.org/page/dcaoyuan/entry/learning_coding_binary_was_tim"&gt;blog&lt;/a&gt;. First I made some bricks: &lt;a href="http://www.erlang.org/pipermail/erlang-questions/attachments/20071028/16fc8af3/attachment-0002.obj"&gt;chunk_reder.erl&lt;/a&gt; with read ahead reading and consequential support, &lt;a href="http://www.erlang.org/pipermail/erlang-questions/attachments/20071028/16fc8af3/attachment-0003.obj"&gt;nlt_reader.erl&lt;/a&gt; with concurrent new line terminated block splitter and catenator and &lt;a href="http://www.erlang.org/pipermail/erlang-questions/attachments/20071028/207cb882/attachment.obj"&gt;file_map_reduce.erl&lt;/a&gt; engine. And I plugged it together in &lt;a href="http://www.erlang.org/pipermail/erlang-questions/attachments/20071028/814bacad/attachment-0001.obj"&gt;wf_pichi3.erl&lt;/a&gt; wide finder. And what is great? It's about 40% faster on single core than ruby code and still scalable:
&lt;pre&gt;$ time ruby1.8 tbray.rb o1M.ap
8900: 2006/09/29/Dynamic-IDE
2000: 2006/07/28/Open-Data
1300: 2003/07/25/NotGaming
800: 2006/01/31/Data-Protection
800: 2003/09/18/NXML
800: 2003/10/16/Debbie
700: 2003/06/23/SamsPie
600: 2006/01/08/No-New-XML-Languages
600: 2005/11/03/Cars-and-Office-Suites
600: 2005/07/27/Atomic-RSS

real    0m7.469s
user    0m6.528s
sys     0m0.940s
$ time erl -noshell -run wf_pichi3 main o1M.ap
8900: 2006/09/29/Dynamic-IDE
2000: 2006/07/28/Open-Data
1300: 2003/07/25/NotGaming
800: 2003/09/18/NXML
800: 2003/10/16/Debbie
800: 2006/01/31/Data-Protection
700: 2003/06/23/SamsPie
600: 2006/01/08/No-New-XML-Languages
600: 2006/09/07/JRuby-guys
600: 2005/07/27/Atomic-RSS

real    0m5.370s
user    0m4.412s
sys     0m0.952s
&lt;/pre&gt;It's big improvement from my last code, about 365% ;-) Good thing, that it's nice jigsaw and very powerful. I think, it is just what Tim Bray want when started Wide Finder Project.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5950739624271007232-1063696450863537442?l=pichis-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pichis-blog.blogspot.com/feeds/1063696450863537442/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5950739624271007232&amp;postID=1063696450863537442' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5950739624271007232/posts/default/1063696450863537442'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5950739624271007232/posts/default/1063696450863537442'/><link rel='alternate' type='text/html' href='http://pichis-blog.blogspot.com/2007/10/faster-than-ruby-but-scalable.html' title='Faster than ruby but scalable'/><author><name>Pichi</name><uri>http://www.blogger.com/profile/12662180723203160349</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5950739624271007232.post-2450563018675312495</id><published>2007-10-27T02:40:00.000-07:00</published><updated>2007-10-27T03:32:54.317-07:00</updated><title type='text'>Scalable splitting is possible</title><content type='html'>In my &lt;a href="http://pichis-blog.blogspot.com/2007/10/wide-finder-project-fold.html"&gt;previous&lt;/a&gt; post I thought that reading and splitting are unscalable processes. It's not true. Reading is scalable, but on current HW it is not useful, because sequential reading is more than twenty times faster than random from disks. But what about splitting and concatenating read chunk by new line. Splitting I can do in parallel and what concatenating? Yes, I can if can keep sequential information and send parts to correct process. Than I wrote scatter-getter algorithm with splitter and concatenator. I was also changed fold-reduce to map-reduce.
Code is separated to three modules. Main module is &lt;code&gt;tbray_pichi1&lt;/code&gt;
&lt;pre&gt;%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Yet another Erlang solution to Tim Bray's Wide Finder project
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Author: Hynek (Pichi) Vychodil (http://pichis_blog.blogspot.com/), 23 October 2007.

-module(tbray_pichi1).

-export([main/1, start/1, start/2, start/3]).

start(FileName, ChunkSize) -&amp;gt;
    start(FileName, ChunkSize,
          erlang:system_info(schedulers) * 8).

start(FileName) -&amp;gt; start(FileName, 1024 * 32).

main([FileName, ChunkSize, N]) -&amp;gt;
    start(FileName, list_to_integer(ChunkSize),
          list_to_integer(N)),
    halt();
main([FileName, ChunkSize]) -&amp;gt;
    start(FileName, list_to_integer(ChunkSize)), halt();
main([FileName]) -&amp;gt; start(FileName), halt().

start(FileName, ChunkSize, N) -&amp;gt;
    Start = now(),
    Result = nlt_map_reduce:map_reduce(
        FileName,
        _Mapper = fun (Chunk) -&amp;gt;
                scan(binary_to_list(Chunk), dict:new())
            end,
        _Reducer = fun (A, B) -&amp;gt;
                dict:merge(fun (_, V1, V2) -&amp;gt; V1 + V2 end, A, B)
            end,
        ChunkSize, N),
    Delta = timer:now_diff(now(), Start) / 1000,
    print_result(Result),
    if Delta &amp;gt; 1000 -&amp;gt;
           io:format("Time: ~.3f s~n", [Delta / 1000]);
       true -&amp;gt; io:format("Time: ~.3f ms~n", [Delta])
    end,
    ok.

print_result(Dict) -&amp;gt;
    [R1, R2, R3, R4, R5, R6, R7, R8, R9, R10 | _] =
        lists:reverse(lists:keysort(2, dict:to_list(Dict))),
    lists:foreach(fun ({Word, Count}) -&amp;gt;
                          io:format("~p get requests for ~s~n", [Count, Word])
                  end,
                  [R1, R2, R3, R4, R5, R6, R7, R8, R9, R10]).

scan("GET /ongoing/When/" ++
       [_, _, _, $x, $/, Y1, Y2, Y3, Y4, $/, M1, M2, $/, D1,
        D2, $/
        | Rest],
     Dict) -&amp;gt;
    case scan_key(Rest) of
      {[_ | _] = Key, NewRest} -&amp;gt;
          scan(NewRest,
               dict:update_counter(
                    [Y1, Y2, Y3, Y4, $/, M1, M2, $/, D1, D2, $/ | Key],
                    1, Dict));
      {[], NewRest} -&amp;gt; scan(NewRest, Dict)
    end;
scan([_ | Rest], Dict) -&amp;gt; scan(Rest, Dict);
scan([], Dict) -&amp;gt; Dict.

scan_key(L) -&amp;gt; scan_key(L, []).

scan_key([$\s | Rest], Key) -&amp;gt;
    {lists:reverse(Key), Rest};
scan_key([$\n | Rest], _) -&amp;gt; {[], Rest};
scan_key([$. | Rest], _) -&amp;gt; {[], Rest};
scan_key([C | Rest], Key) -&amp;gt; scan_key(Rest, [C | Key]);
scan_key([], _) -&amp;gt; {[], []}.&lt;/pre&gt;
Second new line terminated chunks map reducer (&lt;code&gt;nlt_map_reduce&lt;/code&gt;)&lt;pre&gt;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% MapReduce for new line terminated blocks of file
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Author: Hynek (Pichi) Vychodil (http://pichis_blog.blogspot.com/), 22 October 2007.

-module(nlt_map_reduce).

%-compile([native, {hipe, [o3]}]).
-import(chunk_reader).

-export([map_reduce/5]).

% mapper reducer process info
-record(map_reduce,
        {mapper, reducer, result = '', data, receiver}).

% process spawner loop info
-record(context,
        {mr, cr, proc = [], n = 0, max = 1, maxR = 1}).

map_reduce(FileName, Mapper, Reducer, ChunkSize, N) -&amp;gt;
    {ok, ChunkReader} = chunk_reader:open(FileName,
                                          ChunkSize),
    Result = start(#map_reduce{mapper = Mapper,
                               reducer =
                                   fun ('', A) -&amp;gt; A;
                                       (A, '') -&amp;gt; A;
                                       (A, B) -&amp;gt; Reducer(A, B)
                                   end,
                               receiver = self()},
                   ChunkReader, N),
    ok = chunk_reader:close(ChunkReader),
    Result.

start(MR, ChunkReader, N) -&amp;gt;
    spawn_proc(#context{mr = MR, cr = ChunkReader, max = N},
               chunk_reader:read(ChunkReader), you_first).

loop(#context{cr = ChunkReader, proc = [{_, Last} | _],
              n = N, max = Max} =
         C)
    when N &amp;lt; Max -&amp;gt;
    receive
      map_done -&amp;gt; loop(C#context{n = N - 1})
      after 0 -&amp;gt;
                spawn_proc(C, chunk_reader:read(ChunkReader), Last)
    end;
loop(#context{n = N} = C) -&amp;gt;
    receive map_done -&amp;gt; loop(C#context{n = N - 1}) end.

spawn_proc(#context{mr = MR, n = N, proc = Proc,
                    maxR = R} =
               C,
           {ok, Chunk}, Last) -&amp;gt;
    loop(C#context{
           proc = send_result_requests(
                [{0, spawn_opt(fun () -&amp;gt;
                            split(MR#map_reduce{data = Chunk}, Last)
                        end,
                        [link])}
                | Proc]),
           n = N + 1,
           maxR =
               if R &amp;gt; N + 1 -&amp;gt; R;
                  true -&amp;gt; N + 1
               end});
spawn_proc(_, eof, you_first) -&amp;gt;
    '';                     % empty file
spawn_proc(#context{proc = Proc}, eof,
           Last) -&amp;gt;         % finalise
    Last ! you_last,
    send_final_result_requests(Proc),
    wait_for_result().

wait_for_result() -&amp;gt;
    receive
      {result, Result} -&amp;gt; Result;
      _ -&amp;gt; wait_for_result()  % clear pending messages if any
    end.

send_result_requests([{L1, P1}, {L1, P2} | T]) -&amp;gt;
    P2 ! {send_result_to, P1, L1},
    send_result_requests([{L1 + 1, P1} | T]);
send_result_requests(T) -&amp;gt; T.

send_final_result_requests(T) -&amp;gt;
    [{L1, P} | R] = lists:reverse(T),
    L = send_final_result_requests_i(L1, P, R),
    P ! {send_result_to, self(), L}.

send_final_result_requests_i(L, P, [{L1, P1} | T]) -&amp;gt;
    P1 ! {send_result_to, P, L1},
    send_final_result_requests_i(L + 1, P, T);
send_final_result_requests_i(L, _, []) -&amp;gt; L.

% mapper reducer process states
split(#map_reduce{data = Data} = MR, you_first) -&amp;gt;
    map_it(MR#map_reduce{data = join_next(Data)});
split(#map_reduce{data = Data} = MR, PrevPid) -&amp;gt;
    case split_on_nl(Data) of
      {_, none} -&amp;gt;    % do nothing yourself, send it
          PrevPid ! {your_next_part, join_next(Data)},
          map_done(MR#map_reduce{data = done});
      {Line, Rest} -&amp;gt;
          PrevPid ! {your_next_part, Line},
          map_it(MR#map_reduce{data = join_next(Rest)})
    end.

join_next(Data) -&amp;gt;
    receive
      {your_next_part, Next} -&amp;gt; &amp;lt;&amp;lt;Data/binary, Next/binary&amp;gt;&amp;gt;;
      you_last -&amp;gt; Data
    end.

map_it(#map_reduce{mapper = Mapper, data = Data} =
           MR) -&amp;gt;
    map_done(MR#map_reduce{data = done,
                           result = Mapper(Data)}).

map_done(#map_reduce{receiver = Master} = MR) -&amp;gt;
    Master ! map_done,      % notice master you done map
    reduce_and_wait(MR, 0).

reduce_and_wait(#map_reduce{result = Acc,
                            reducer = Reducer} =
                    MR,
                N) -&amp;gt;
    receive
      {send_result_to, Receiver, WaitForN} -&amp;gt;
          reduce(MR#map_reduce{receiver = Receiver}, N, WaitForN);
      {result, Result} -&amp;gt;
          reduce_and_wait(MR#map_reduce{result =
                                            Reducer(Acc, Result)},
                          N + 1)
    end.

reduce(#map_reduce{result = Acc, reducer = Reducer} =
           MR,
       N, WaitForN)
    when N &amp;lt; WaitForN -&amp;gt;
    receive
      {result, Result} -&amp;gt;
          reduce(MR#map_reduce{result = Reducer(Acc, Result)},
                 N + 1, WaitForN)
    end;
reduce(#map_reduce{receiver = Receiver,
                   result = Result},
       _, _) -&amp;gt;
    Receiver ! {result, Result}.    % We are finished

%splitter
split_on_nl(B) -&amp;gt; split_on_nl(B, 0, size(B)).

split_on_nl(B, N, S) when N &amp;lt; S -&amp;gt;
    case B of
      &amp;lt;&amp;lt;Line:N/binary, $\n, Tail/binary&amp;gt;&amp;gt; -&amp;gt; {Line, Tail};
      _ -&amp;gt; split_on_nl(B, N + 1, S)
    end;
split_on_nl(B, _, _) -&amp;gt; {B, none}.&lt;/pre&gt;
And last is read ahead chunk reader&lt;pre&gt;%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Chunk reader process with read ahead
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Author: Hynek (Pichi) Vychodil (http://pichis_blog.blogspot.com/), 22 October 2007.

-module(chunk_reader).

-export([close/1, open/2, read/1]).

-export([test_read/2]).

% how many second have to wait for response
-define(TIMEOUT, 60).

% open(FileName, ChunkSize) -&amp;gt; {ok, ChunkReader}
open(FileName, ChunkSize) -&amp;gt;
    M = self(),
    {ok,
     {chunk_reader,
      spawn_opt(fun () -&amp;gt;
                        {ok, File} = file:open(FileName, [read, raw, binary]),
                        process_flag(trap_exit, true),
                        loop(M, File, file:read(File, ChunkSize), ChunkSize)
                end,
                [link, {priority, high}])}}.

% close(ChunkReader) -&amp;gt; ok | {error, invalid}
close({chunk_reader, Pid}) when is_pid(Pid) -&amp;gt;
    case is_process_alive(Pid) of
      true -&amp;gt; Pid ! close, ok;
      false -&amp;gt; {error, invalid}
    end.

% read(ChunkReader) -&amp;gt; eof | {ok, Data} | {error, invalid | closed}
read({chunk_reader, Pid}) when is_pid(Pid) -&amp;gt;
    case is_process_alive(Pid) of
      true -&amp;gt; Pid ! {read, self()}, wait_response(Pid, 0);
      false -&amp;gt; {error, invalid}
    end.

wait_response(Pid, N) when N &amp;lt; (?TIMEOUT) -&amp;gt;
    receive
      {ok, _} = Msg -&amp;gt; Msg;
      eof -&amp;gt; eof
      after 1000 -&amp;gt;   % take it long?
                case is_process_alive(Pid) of
                  true -&amp;gt; wait_response(Pid, N + 1);
                  false -&amp;gt; {error, closed}
                end
    end;
wait_response(_, _) -&amp;gt; {error, timeout}.

loop(Master, File, Chunk, ChunkSize) -&amp;gt;
    receive
      {read, From} -&amp;gt;
          From ! Chunk,
          case Chunk of
            {ok, _} -&amp;gt;
                loop(Master, File, file:read(File, ChunkSize),
                     ChunkSize);
            eof -&amp;gt; file:close(File), eof_loop(Master, From)
          end;
      close -&amp;gt; file:close(File);
      {'EXIT', Master, _} -&amp;gt; file:close(File);
      _ -&amp;gt;
          loop(Master, File, Chunk, ChunkSize)  % ignore unknow
    end.

eof_loop(Master) -&amp;gt;  % wait for eof request
    receive
      {read, From} -&amp;gt; From ! eof, eof_loop(Master);
      close -&amp;gt; ok;
      {'EXIT', Master, _} -&amp;gt; ok;
      _ -&amp;gt; eof_loop(Master)
    end.

% speed testing function
% test_read(FileName, ChunkSize) -&amp;gt; ok | {error, invalid}
test_read(FileName, ChunkSize) -&amp;gt;
    {ok, File} = open(FileName, ChunkSize),
    eof = test_read_loop(File, read(File)),
    close(File).

test_read_loop(File, {ok, _}) -&amp;gt;
    test_read_loop(File, read(File));
test_read_loop(_, eof) -&amp;gt; eof.&lt;/pre&gt;
But &lt;code&gt;nlt_map_reduce&lt;/code&gt; code is too complicated, bad readable and what is the worst, 20% slower on single core. All this indicate, that there is some problem and I think it is dictionary sending between processes. Dictionary is copied every chunk and it cost to much. Then I want rewrite it to more fancy code and fall back to fold-reduce concept, because this concept send less dictionaries.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5950739624271007232-2450563018675312495?l=pichis-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pichis-blog.blogspot.com/feeds/2450563018675312495/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5950739624271007232&amp;postID=2450563018675312495' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5950739624271007232/posts/default/2450563018675312495'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5950739624271007232/posts/default/2450563018675312495'/><link rel='alternate' type='text/html' href='http://pichis-blog.blogspot.com/2007/10/scalable-splitting-is-possible.html' title='Scalable splitting is possible'/><author><name>Pichi</name><uri>http://www.blogger.com/profile/12662180723203160349</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5950739624271007232.post-3218067661678418389</id><published>2007-10-21T10:38:00.000-07:00</published><updated>2007-10-21T11:43:34.370-07:00</updated><title type='text'>Wide Finder Project - fold&amp;reduce</title><content type='html'>I have made new version of &lt;a target="_blank" href="http://www.tbray.org/ongoing/When/200x/2007/09/20/Wide-Finder"&gt;Wide Finder Project&lt;/a&gt;. I was inspired by last Caoyuan's last &lt;a href="http://blogtrader.net/page/dcaoyuan/entry/learning_coding_parallelization_was_tim"&gt;work&lt;/a&gt;. I thought about i/o operation too, but I think, parallelisation of file reading is not good idea. Instead of this I tried split file reading and new line finding in two independent processes. One process which read and one which searching for new line. It's looks expensive, send big messages, but I send binaries and binaries less than 64 bytes are not copied, but only pointers passed.
I also look for new line from head, because I think binary splitting is faster when first part is smaller than second. Second part can be keep on its place and only pointer is moved and smaller first part is copied to new position. But when glue second part from previous read chunk with first part of current, I must copy bigger part and this is expensive. It looks like same as Caoyuan do, but I don't do it in splitter, but in worker. Both parts I send as binary apart. It's cheap.
Why all this? Make minimal work in one process. One process only reads as fast as possible. One process splitting by new line and don't gluing and all other work I can do in parallel.
But when one splitter calls reader for new chunk, it must not wait for reader until it read next chunk. Better if reader have next chunk prepared. And splitter dtto. Splitter must have chunk split prepared before any worker calls for new parts. Then I made read ahead file reader and chunk splitter.
&lt;pre&gt;%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Chunk reader process with read ahead
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

file_open(FileName, ChunkSize, chunk) -&amp;gt;        % raw chunks
M = self(),
{ok, {chunk_reader, spawn_link(fun() -&amp;gt;
       {ok, File} = file:open(FileName, [read, raw, binary]),
       process_flag(trap_exit, true),
       process_flag(priority, high),
       file_loop(M, File, file:read(File, ChunkSize), ChunkSize)
   end)}};
file_open(FileName, ChunkSize, nlt_chunk) -&amp;gt;    % new line terminated chunks
M = self(),
{ok, {nlt_chunk_reader, spawn_link(fun() -&amp;gt;
       {ok, CR} = file_open(FileName, ChunkSize, chunk),
       process_flag(trap_exit, true),
       process_flag(priority, high),
       {ok, First_Read} = file_read(CR),
       cr_loop(
           M,
           CR,
           cr_read_n_split(CR, First_Read, file_read(CR)))
   end)}}.

file_read({Type, Pid}) when Type == chunk_reader; Type == nlt_chunk_reader -&amp;gt;
case is_process_alive(Pid) of
   true -&amp;gt;
       Pid ! {read, self()},
       receive
           {ok, B} -&amp;gt; {ok, B};
           eof -&amp;gt; eof
       after 60000 -&amp;gt; timeout   % Possible race condition with is_process_alive
       end;
   false -&amp;gt; error
end.

file_close({Type, Pid}) when Type == chunk_reader; Type == nlt_chunk_reader -&amp;gt;
case is_process_alive(Pid) of
   true -&amp;gt; Pid ! close, ok;
   false -&amp;gt; error
end.

file_loop(Master, File, Chunk, ChunkSize) -&amp;gt;
receive
   {read, From} -&amp;gt;
       From ! Chunk,
       case Chunk of
           {ok, _} -&amp;gt;
               file_loop(Master, File, file:read(File, ChunkSize), ChunkSize);
           eof -&amp;gt;
               file:close(File),
               file_eof_loop(Master)
       end;
   close -&amp;gt; file:close(File);
   {'EXIT', Master, _} -&amp;gt; file:close(File);
   _ -&amp;gt; file_loop(Master, File, Chunk, ChunkSize)  % ignore unknow
end.

file_eof_loop(Master) -&amp;gt;  % wait for eof request
receive
   {read, From} -&amp;gt;
       From ! eof,
       file_eof_loop(Master);
   close -&amp;gt; ok;
   {'EXIT', Master, _} -&amp;gt; ok;
   _ -&amp;gt; file_eof_loop(Master)
end.

cr_loop(Master, CR, {Prev, Line, Next}) -&amp;gt;
receive
   {read, From} -&amp;gt;
       From ! {ok, {Prev, Line}},
       case Next of
           _ when is_binary(Next) -&amp;gt;
               cr_loop(Master, CR, cr_read_n_split(CR, Next, file_read(CR)));
           eof -&amp;gt;
               file_close(CR),
               file_eof_loop(Master)
       end;
   close -&amp;gt; file_close(CR);
   {'EXIT', Master, _} -&amp;gt; file_close(CR);
   _ -&amp;gt; cr_loop(Master, CR, {Prev, Line, Next})    % ignore unknow
end.

cr_read_n_split(CR, Prev, {ok, B}) -&amp;gt;
case split_on_nl(B) of
   {Line, Rest} when is_binary(Rest) -&amp;gt;    % nonempty remaining part
       { Prev, Line, Rest };
   {Line, none} -&amp;gt; % new line not found, read again, should be very rare
       cr_read_n_split(CR, &amp;lt;&amp;lt;Prev/binary, Line/binary&amp;gt;&amp;gt;, file_read(CR))
end;
cr_read_n_split(_CR, Prev, eof) -&amp;gt;
{&amp;lt;&amp;lt;&amp;gt;&amp;gt;, Prev, eof}.  % easier joining at this order

split_on_nl(B) -&amp;gt; split_on_nl(B, 0, size(B)).

split_on_nl(B, N, S) when N &amp;lt; S -&amp;gt;
case B of
   &amp;lt;&amp;lt;Line:N/binary, $\n, Tail/binary&amp;gt;&amp;gt; -&amp;gt; {Line, Tail};
   _ -&amp;gt; split_on_nl(B, N+1, S)
end;
split_on_nl(B, _, _) -&amp;gt; {B, none}.

% speed testing functions
file_test_read(FileName, ChunkSize, Type) -&amp;gt;
{ok, File} = file_open(FileName, ChunkSize, Type),
eof = file_test_read_loop(File, file_read(File)),
file_close(File).

file_test_read_loop(File, {ok, _}) -&amp;gt;
file_test_read_loop(File, file_read(File));
file_test_read_loop(_, eof) -&amp;gt;
eof.
&lt;/pre&gt;
When I have this file like devices, I thought about Tim Bray's request more readable and cleaner code. So what I want to do? Some like map_reduce but not exactly map_reduce. It looks like fold_reduce. I want fold over each chunk aka scan for some pattern and than I want collect all results and I want do it in parallel. Then I made fold_reduce operator over new line terminated chunk read from file, just fold_reduce_file.
&lt;pre&gt;%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Proof of concept of fold&amp;amp;reduce
% on file by new line terminated chunks
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

-record(context, {acc,
               chunkNum,
               processedNum = 0,
               reducer}).

fold_reduce_file(FileName, Acc0, Folderer, Reducer, ChunkSize, N) -&amp;gt;
 {ok, CR} = file_open(FileName, ChunkSize, nlt_chunk),
 M = self(),
 do_n(fun() -&amp;gt;
     spawn_link(fun()-&amp;gt; folderer(CR, Acc0, Folderer, M) end)
     end, 0, N),
 Result = collect_loop(#context{
     acc=Acc0,
     chunkNum=N,
     reducer=Reducer}),
 file_close(CR),
 Result.

do_n(What, Start, Stop) when Start &amp;lt; Stop -&amp;gt; What(), do_n(What, Start+1, Stop);
do_n(_, _, _) -&amp;gt; ok.

folderer(CR, Acc0, Folderer, Collector) -&amp;gt;
 case file_read(CR) of
     {ok, {A, B}} -&amp;gt;
         folderer(CR, Folderer(
                 Acc0,
                 binary_to_list(A) ++ binary_to_list(B)
             ), Folderer, Collector);
     eof -&amp;gt;
         Collector ! {result, Acc0}
 end.

collect_loop(#context{acc=Acc0,
                   chunkNum=ChunkNum,
                   processedNum=ProcessedNum,
                   reducer=Reducer}=Context) -&amp;gt;
 case ProcessedNum of
     ChunkNum -&amp;gt;
         Acc0;
     _ -&amp;gt;
         receive
             {result, Result} -&amp;gt;
                 collect_loop(Context#context{
                     acc = Reducer(Acc0, Result),
                     processedNum = ProcessedNum+1})
         end
 end.
&lt;/pre&gt;It looks complicated but, its only infrastructure. Now I have tool to make Tim Bray's exercise easy, but not only this one, but any similar task. Tim Bray's exercise implementation with this tool is here.
&lt;pre&gt;start(FileName) -&amp;gt; start(FileName, 1024*32, 1).
start(FileName, N) -&amp;gt; start(FileName, 1024*32, N).
start(FileName, ChunkSize, N) -&amp;gt;
 Start = now(),
 Result = fold_reduce_file(
     FileName,
     _Acc0 = dict:new(),
     _Folderer = fun(Acc, Chunk) -&amp;gt; scan(Chunk, Acc) end,
     _Reducer = fun(Acc, Result) -&amp;gt; dict:merge(
             fun(_,V1,V2) -&amp;gt; V1+V2 end,
             Acc,
             Result
         ) end,
     ChunkSize,
     N
 ),
 Delta = timer:now_diff(now(), Start) / 1000,
 print_result(Result),
 if
     Delta &amp;gt; 1000 -&amp;gt; io:format("Time: ~.3f s~n", [Delta/1000]);
     true -&amp;gt; io:format("Time: ~.3f ms~n", [Delta])
 end,
 ok.

print_result(Dict) -&amp;gt;
 [R1, R2, R3, R4, R5, R6, R7, R8, R9, R10 | _] = lists:reverse(lists:keysort(2, dict:to_list(Dict))),
 lists:foreach(fun ({Word, Count}) -&amp;gt;
                       io:format("~p get requests for ~s~n", [Count, Word])
               end, [R1, R2, R3, R4, R5, R6, R7, R8, R9, R10]).

scan("GET /ongoing/When/" ++ [_,_,_,$x,$/,Y1,Y2,Y3,Y4,$/,M1,M2,$/,D1,D2,$/|Rest], Dict) -&amp;gt;
 case scan_key(Rest) of
     {[_|_] = Key, NewRest} -&amp;gt;
         scan(NewRest, dict:update_counter([Y1,Y2,Y3,Y4,$/,M1,M2,$/,D1,D2,$/|Key], 1, Dict));
     {[], NewRest} -&amp;gt; scan(NewRest, Dict)
 end;
scan([_|Rest], Dict) -&amp;gt; scan(Rest, Dict);
scan([], Dict) -&amp;gt; Dict.

scan_key(L) -&amp;gt; scan_key(L, []).

scan_key([$ |Rest], Key) -&amp;gt; {lists:reverse(Key), Rest};
scan_key([$\n|Rest], _) -&amp;gt; {[], Rest};
scan_key([$.|Rest], _) -&amp;gt; {[], Rest};
scan_key([C|Rest], Key) -&amp;gt; scan_key(Rest, [C|Key]);
scan_key([], _) -&amp;gt; {[],[]}.
&lt;/pre&gt;Good new is, this is faster than my &lt;a href="http://pichis-blog.blogspot.com/2007/10/binaries-realy-faster-than-lists.html"&gt;tbray2&lt;/a&gt; and also Caoyuan's &lt;a href="http://blogtrader.net/page/dcaoyuan/entry/learning_coding_parallelization_was_tim"&gt;tbray4&lt;/a&gt; on single core. But I can't test it on multi core now. All source code for testing is bellow. When N is number of processor threads, interesting will be test N-1, N , 2*N-1 or 2*N  folderer  processes.
&lt;pre&gt;-module(tbray6).

%-compile([debug_info, native, {hipe, [o3]}]).

-export([start/1, start/2, start/3]).
-export([file_open/3, file_read/1, file_close/1, file_test_read/3]).
-export([fold_reduce_file/6]).

start(FileName) -&amp;gt; start(FileName, 1024*32, 1).
start(FileName, N) -&amp;gt; start(FileName, 1024*32, N).
start(FileName, ChunkSize, N) -&amp;gt;
 Start = now(),
 Result = fold_reduce_file(
     FileName,
     _Acc0 = dict:new(),
     _Folderer = fun(Acc, Chunk) -&amp;gt; scan(Chunk, Acc) end,
     _Reducer = fun(Acc, Result) -&amp;gt; dict:merge(
             fun(_,V1,V2) -&amp;gt; V1+V2 end,
             Acc,
             Result
         ) end,
     ChunkSize,
     N
 ),
 Delta = timer:now_diff(now(), Start) / 1000,
 print_result(Result),
 if
     Delta &amp;gt; 1000 -&amp;gt; io:format("Time: ~.3f s~n", [Delta/1000]);
     true -&amp;gt; io:format("Time: ~.3f ms~n", [Delta])
 end,
 ok.

print_result(Dict) -&amp;gt;
 [R1, R2, R3, R4, R5, R6, R7, R8, R9, R10 | _] = lists:reverse(lists:keysort(2, dict:to_list(Dict))),
 lists:foreach(fun ({Word, Count}) -&amp;gt;
                       io:format("~p get requests for ~s~n", [Count, Word])
               end, [R1, R2, R3, R4, R5, R6, R7, R8, R9, R10]).

scan("GET /ongoing/When/" ++ [_,_,_,$x,$/,Y1,Y2,Y3,Y4,$/,M1,M2,$/,D1,D2,$/|Rest], Dict) -&amp;gt;
 case scan_key(Rest) of
     {[_|_] = Key, NewRest} -&amp;gt;
         scan(NewRest, dict:update_counter([Y1,Y2,Y3,Y4,$/,M1,M2,$/,D1,D2,$/|Key], 1, Dict));
     {[], NewRest} -&amp;gt; scan(NewRest, Dict)
 end;
scan([_|Rest], Dict) -&amp;gt; scan(Rest, Dict);
scan([], Dict) -&amp;gt; Dict.

scan_key(L) -&amp;gt; scan_key(L, []).

scan_key([$ |Rest], Key) -&amp;gt; {lists:reverse(Key), Rest};
scan_key([$\n|Rest], _) -&amp;gt; {[], Rest};
scan_key([$.|Rest], _) -&amp;gt; {[], Rest};
scan_key([C|Rest], Key) -&amp;gt; scan_key(Rest, [C|Key]);
scan_key([], _) -&amp;gt; {[],[]}.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Proof of concept of fold&amp;amp;reduce
% on file by new line terminated chunks
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

-record(context, {acc,
               chunkNum,
               processedNum = 0,
               reducer}).

fold_reduce_file(FileName, Acc0, Folderer, Reducer, ChunkSize, N) -&amp;gt;
 {ok, CR} = file_open(FileName, ChunkSize, nlt_chunk),
 M = self(),
 do_n(fun() -&amp;gt;
     spawn_link(fun()-&amp;gt; folderer(CR, Acc0, Folderer, M) end)
     end, 0, N),
 Result = collect_loop(#context{
     acc=Acc0,
     chunkNum=N,
     reducer=Reducer}),
 file_close(CR),
 Result.

do_n(What, Start, Stop) when Start &amp;lt; Stop -&amp;gt; What(), do_n(What, Start+1, Stop);
do_n(_, _, _) -&amp;gt; ok.

folderer(CR, Acc0, Folderer, Collector) -&amp;gt;
 case file_read(CR) of
     {ok, {A, B}} -&amp;gt;
         folderer(CR, Folderer(
                 Acc0,
                 binary_to_list(A) ++ binary_to_list(B)
             ), Folderer, Collector);
     eof -&amp;gt;
         Collector ! {result, Acc0}
 end.

collect_loop(#context{acc=Acc0,
                   chunkNum=ChunkNum,
                   processedNum=ProcessedNum,
                   reducer=Reducer}=Context) -&amp;gt;
 case ProcessedNum of
     ChunkNum -&amp;gt;
         Acc0;
     _ -&amp;gt;
         receive
             {result, Result} -&amp;gt;
                 collect_loop(Context#context{
                     acc = Reducer(Acc0, Result),
                     processedNum = ProcessedNum+1})
         end
 end.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Chunk reader process with read ahead
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

file_open(FileName, ChunkSize, chunk) -&amp;gt;        % raw chunks
 M = self(),
 {ok, {chunk_reader, spawn_link(fun() -&amp;gt;
         {ok, File} = file:open(FileName, [read, raw, binary]),
         process_flag(trap_exit, true),
         process_flag(priority, high),
         file_loop(M, File, file:read(File, ChunkSize), ChunkSize)
     end)}};
file_open(FileName, ChunkSize, nlt_chunk) -&amp;gt;    % new line terminated chunks
 M = self(),
 {ok, {nlt_chunk_reader, spawn_link(fun() -&amp;gt;
         {ok, CR} = file_open(FileName, ChunkSize, chunk),
         process_flag(trap_exit, true),
         process_flag(priority, high),
         {ok, First_Read} = file_read(CR),
         cr_loop(
             M,
             CR,
             cr_read_n_split(CR, First_Read, file_read(CR)))
     end)}}.

file_read({Type, Pid}) when Type == chunk_reader; Type == nlt_chunk_reader -&amp;gt;
 case is_process_alive(Pid) of
     true -&amp;gt;
         Pid ! {read, self()},
         receive
             {ok, B} -&amp;gt; {ok, B};
             eof -&amp;gt; eof
         after 60000 -&amp;gt; timeout   % Possible race condition with is_process_alive
         end;
     false -&amp;gt; error
 end.

file_close({Type, Pid}) when Type == chunk_reader; Type == nlt_chunk_reader -&amp;gt;
 case is_process_alive(Pid) of
     true -&amp;gt; Pid ! close, ok;
     false -&amp;gt; error
 end.

file_loop(Master, File, Chunk, ChunkSize) -&amp;gt;
 receive
     {read, From} -&amp;gt;
         From ! Chunk,
         case Chunk of
             {ok, _} -&amp;gt;
                 file_loop(Master, File, file:read(File, ChunkSize), ChunkSize);
             eof -&amp;gt;
                 file:close(File),
                 file_eof_loop(Master)
         end;
     close -&amp;gt; file:close(File);
     {'EXIT', Master, _} -&amp;gt; file:close(File);
     _ -&amp;gt; file_loop(Master, File, Chunk, ChunkSize)  % ignore unknow
 end.

file_eof_loop(Master) -&amp;gt;  % wait for eof request
 receive
     {read, From} -&amp;gt;
         From ! eof,
         file_eof_loop(Master);
     close -&amp;gt; ok;
     {'EXIT', Master, _} -&amp;gt; ok;
     _ -&amp;gt; file_eof_loop(Master)
 end.

cr_loop(Master, CR, {Prev, Line, Next}) -&amp;gt;
 receive
     {read, From} -&amp;gt;
         From ! {ok, {Prev, Line}},
         case Next of
             _ when is_binary(Next) -&amp;gt;
                 cr_loop(Master, CR, cr_read_n_split(CR, Next, file_read(CR)));
             eof -&amp;gt;
                 file_close(CR),
                 file_eof_loop(Master)
         end;
     close -&amp;gt; file_close(CR);
     {'EXIT', Master, _} -&amp;gt; file_close(CR);
     _ -&amp;gt; cr_loop(Master, CR, {Prev, Line, Next})    % ignore unknow
 end.

cr_read_n_split(CR, Prev, {ok, B}) -&amp;gt;
 case split_on_nl(B) of
     {Line, Rest} when is_binary(Rest) -&amp;gt;    % nonempty remaining part
         { Prev, Line, Rest };
     {Line, none} -&amp;gt; % new line not found, read again, should be very rare
         cr_read_n_split(CR, &amp;lt;&amp;lt;Prev/binary, Line/binary&amp;gt;&amp;gt;, file_read(CR))
 end;
cr_read_n_split(_CR, Prev, eof) -&amp;gt;
 {&amp;lt;&amp;lt;&amp;gt;&amp;gt;, Prev, eof}.  % easier joining at this order

split_on_nl(B) -&amp;gt; split_on_nl(B, 0, size(B)).

split_on_nl(B, N, S) when N &amp;lt; S -&amp;gt;
 case B of
     &amp;lt;&amp;lt;Line:N/binary, $\n, Tail/binary&amp;gt;&amp;gt; -&amp;gt; {Line, Tail};
     _ -&amp;gt; split_on_nl(B, N+1, S)
 end;
split_on_nl(B, _, _) -&amp;gt; {B, none}.

% speed testing functions
file_test_read(FileName, ChunkSize, Type) -&amp;gt;
 {ok, File} = file_open(FileName, ChunkSize, Type),
 eof = file_test_read_loop(File, file_read(File)),
 file_close(File).

file_test_read_loop(File, {ok, _}) -&amp;gt;
 file_test_read_loop(File, file_read(File));
file_test_read_loop(_, eof) -&amp;gt;
 eof.
&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5950739624271007232-3218067661678418389?l=pichis-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pichis-blog.blogspot.com/feeds/3218067661678418389/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5950739624271007232&amp;postID=3218067661678418389' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5950739624271007232/posts/default/3218067661678418389'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5950739624271007232/posts/default/3218067661678418389'/><link rel='alternate' type='text/html' href='http://pichis-blog.blogspot.com/2007/10/wide-finder-project-fold.html' title='Wide Finder Project - fold&amp;reduce'/><author><name>Pichi</name><uri>http://www.blogger.com/profile/12662180723203160349</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5950739624271007232.post-6256206119426144973</id><published>2007-10-07T23:01:00.000-07:00</published><updated>2007-10-07T23:32:26.372-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='performance'/><category scheme='http://www.blogger.com/atom/ns#' term='erlang'/><title type='text'>Is bfile faster than old erlang file?</title><content type='html'>&lt;a href="http://steve.vinoski.net/blog/"&gt;Steve Vinoski&lt;/a&gt; is using  &lt;a href="http://patricklogan.blogspot.com/2007/09/when-it-rains-it-pours-io-all-over.html"&gt;klacke’s &lt;code&gt;bfile&lt;/code&gt; module&lt;/a&gt; in &lt;a href="http://steve.vinoski.net/blog/2007/09/29/more-file-processing-with-erlang/"&gt;his&lt;/a&gt; &lt;a href="http://www.tbray.org/ongoing/When/200x/2007/09/20/Wide-Finder"&gt;Wide Finder Project&lt;/a&gt; work, but I don't know why &lt;code&gt;bfile&lt;/code&gt; should be faster than erlang OTP &lt;code&gt;file&lt;/code&gt;. Well, then I measured. I tried Steve's &lt;a href="http://steve.vinoski.net/blog/2007/09/29/more-file-processing-with-erlang/#comment-31"&gt;read test&lt;/a&gt; and my test on my old home desktop (model name : AMD Athlon(tm) processor, stepping : 2, cpu MHz : 1199.805, cache size : 256 KB).&lt;pre&gt;-module(readold).
-export([start/1, start/2]).
-compile([native]).

scan_file(F, Readsize, Total) -&gt;
   Rd = file:read(F, Readsize),
   case Rd of
       {ok, Bin} -&gt; scan_file(F, Readsize, size(Bin)+Total);
       eof -&gt; Total
   end.
scan_file(F, Readsize) -&gt; scan_file(F, Readsize, 0).

start(File, Readsize) -&gt;
   {ok, F} = file:open(File, [raw, binary, read]),
   T = scan_file(F, Readsize),
   io:format("read ~p bytes~n", [T]),
   file:close(F).
start(File) -&gt;
   start(File, 512*1024).&lt;/pre&gt;
And there are results here:&lt;pre&gt;2&gt; timer:tc(readold,start,["o1M.ap"]).
read 200995500 bytes
{1041306,ok}
3&gt; timer:tc(readold,start,["o1M.ap"]).
read 200995500 bytes
{836876,ok}
4&gt; c(readold).
{ok,readold}
5&gt; timer:tc(readold,start,["o1M.ap"]).
read 200995500 bytes
{837501,ok}
6&gt; timer:tc(read,start,["o1M.ap"]).  
read 200995500 bytes
{1353678,true}
7&gt; timer:tc(read,start,["o1M.ap"]).
read 200995500 bytes
{1237174,true}
8&gt; timer:tc(read,start,["o1M.ap"]).
read 200995500 bytes
{1318029,true}
9&gt; timer:tc(readold,start,["o1M.ap"]).
read 200995500 bytes
{856662,ok}&lt;/pre&gt;
In generally, I don't know why erlang's &lt;code&gt;file&lt;/code&gt; should be slower. I don't know why &lt;code&gt;bfile&lt;/code&gt; is 45% slower than &lt;code&gt;file&lt;/code&gt; on my old home desktop, but why should be faster anywhere? I tested it on Linux, may be &lt;code&gt;bfile&lt;/code&gt; using BSD file implementation is faster on Darwin aka BSD clone? The &lt;code&gt;file&lt;/code&gt; implementation is fast enough on my Erlang/OTP R11B-5.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5950739624271007232-6256206119426144973?l=pichis-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pichis-blog.blogspot.com/feeds/6256206119426144973/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5950739624271007232&amp;postID=6256206119426144973' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5950739624271007232/posts/default/6256206119426144973'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5950739624271007232/posts/default/6256206119426144973'/><link rel='alternate' type='text/html' href='http://pichis-blog.blogspot.com/2007/10/is-bfile-faster-than-old-erlang-file.html' title='Is bfile faster than old erlang file?'/><author><name>Pichi</name><uri>http://www.blogger.com/profile/12662180723203160349</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5950739624271007232.post-5841405227181200374</id><published>2007-10-07T11:10:00.001-07:00</published><updated>2007-10-08T01:48:12.788-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='hipe'/><category scheme='http://www.blogger.com/atom/ns#' term='erlang'/><category scheme='http://www.blogger.com/atom/ns#' term='multi-core'/><title type='text'>erlang-base-hipe.deb don't contain native compiled modules</title><content type='html'>When I measured &lt;a href="http://www.blogtrader.net/page/dcaoyuan"&gt;Caoyuan&lt;/a&gt;  and &lt;a href="http://pichis-blog.blogspot.com/2007/10/binaries-realy-faster-than-lists.html"&gt;my&lt;/a&gt; &lt;a href="http://www.tbray.org/ongoing/When/200x/2007/09/20/Wide-Finder"&gt;Wide Finder Project&lt;/a&gt;  solution I found my binary solution faster than list, but Caoyuan measured reverse result. Then I looked why. Caoyuan is using MacOS X and I'm using Debian. I don't know if MacOS port of erlang has base modules native compiled (with &lt;a href="http://www.it.uu.se/research/group/hipe/"&gt;HiPE&lt;/a&gt;) but I looked on my debian's:&lt;pre&gt;5&gt; proplists:get_value(compile, lists:module_info()).
[{options,[{inline,[{merge3_12,7},
                 {merge3_21,7},
                 {rmerge3_12,7},
                 {rmerge3_21,7}]},
        {inline,[{umerge3_12,8},
                 {umerge3_21,8},
                 {rumerge3_12a,7},
                 {rumerge3_12b,8}]},
        {inline,[{keymerge3_12,12},
                 {keymerge3_21,12},
                 {rkeymerge3_12,12},
                 {rkeymerge3_21,12}]},
        {inline,[{ukeymerge3_12,13},
                 {ukeymerge3_21,13},
                 {rukeymerge3_12a,11},
                 {rukeymerge3_21a,13},
                 {rukeymerge3_12b,12},
                 {rukeymerge3_21b,12}]},
        {cwd,"/tmp/buildd/erlang-11.b.5dfsg/lib/stdlib/src"},
        {outdir,"/tmp/buildd/erlang-11.b.5dfsg/lib/stdlib/src/../ebin"},
        {i,"/tmp/buildd/erlang-11.b.5dfsg/lib/stdlib/src/../include"},
        {i,"/tmp/buildd/erlang-11.b.5dfsg/lib/stdlib/src/../../kernel/include"},
        warn_obsolete_guard,
        debug_info,
        {inline,[{merge3_12,7},
                 {merge3_21,7},
                 {rmerge3_12,7},
                 {rmerge3_21,7}]},
        {inline,[{umerge3_12,8},
                 {umerge3_21,8},
                 {rumerge3_12a,7},
                 {rumerge3_12b,8}]},
        {inline,[{keymerge3_12,12},
                 {keymerge3_21,12},
                 {rkeymerge3_12,12},
                 {rkeymerge3_21,12}]},
        {inline,[{ukeymerge3_12,13},
                 {ukeymerge3_21,13},
                 {rukeymerge3_12a,11},
                 {rukeymerge3_21a,13},
                 {rukeymerge3_12b,12},
                 {rukeymerge3_21b,12}]}]},
{version,"4.4.5"},
{time,{2007,9,28,11,10,32}},
{source,"/tmp/buildd/erlang-11.b.5dfsg/lib/stdlib/src/lists.erl"}]&lt;/pre&gt;
There isn't native option. If MacOS port is native compiled there can be this unexpected difference.

I have tried to create my own &lt;code&gt;erlang-base-hipe&lt;/code&gt; package from source one, but I don't know how to put the option on into the making process. I have tried: &lt;code&gt;debian/rules configure-hipe&lt;/code&gt; but it didn't work so I searched where is native option used.
&lt;pre&gt;$ grep -r +native .
./lib/asn1/src/Makefile:ERL_COMPILE_FLAGS += +native
./lib/megaco/src/app/megaco.mk:ERL_COMPILE_FLAGS += +native
./lib/megaco/test/Makefile:ERL_COMPILE_FLAGS += +native -Dmegaco_hipe_special=true
./lib/megaco/examples/meas/Makefile:ERL_COMPILE_FLAGS += +native
./README:       erlc +native Module.erl&lt;/pre&gt;
Then I tried
&lt;pre&gt;export ERL_COMPILE_FLAGS=+native&lt;/pre&gt;
It didn't affect binary-erlang-base and only generated warning message, but during binary-erlang-base-hipe error occured.
&lt;pre&gt;erlc -W  +native +debug_info +debug_info +debug_info +warn_obsolete_guard -I/home/hynek/work/erlang-11.b.5dfsg/lib/stdlib/include -o../ebin yecc.erl
./yecc.erl:none: internal error in native_compile;
crash reason: {undef,[{hipe,compile,
                            [yecc,
                             [],
                             &lt;&lt;70,79,82,49,0,1,91,124,66,69,65,77,65,116,111,
...
97,252,150,236,255,7,193,199,127,8,0,0&gt;&gt;,
                             []]},
                      {compile,native_compile_1,1},
                      {compile,'-internal_comp/4-anonymous-1-',2},
                      {compile,fold_comp,3},
                      {compile,internal_comp,4},
                      {compile,internal,3}]}
make[4]: Leaving directory `/home/hynek/work/erlang-11.b.5dfsg/lib/parsetools/src'
make[3]: Leaving directory `/home/hynek/work/erlang- 11.b.5dfsg/lib/parsetools'
make[2]: Leaving directory `/home/hynek/work/erlang-11.b.5dfsg/lib'
make[1]: Leaving directory `/home/hynek/work/erlang-11.b.5dfsg'&lt;/pre&gt;
Then I tested if yecc can be compiled native and it can be.
&lt;pre&gt;3&gt; c("/usr/lib/erlang/lib/parsetools-1.4.1.1/src/yecc.erl", [native, {i, "/usr/lib/erlang/lib/stdlib-1.14.5/include/"}, {outdir, "."}]).
{ok,yecc}
4&gt; proplists:get_value(compile, yecc:module_info()).                                                                                    
[{options,[{inline,[{compute_closure,3}]},
           {nowarn_unused_function,{function_name,2}},
           {inline,[{set_empty,0}]},
           {inline,[{set_member,2}]},
           {inline,[{set_delete,2}]},
           {inline,[{set_union,2}]},
           {inline,[{set_is_subset,2}]},
           {inline,[{is_terminal,2}]},
           native,
           {i,"/usr/lib/erlang/lib/stdlib-1.14.5/include/"},
           {outdir,"."},
           {inline,[{compute_closure,3}]},
           {nowarn_unused_function,{function_name,2}},
           {inline,[{set_empty,0}]},
           {inline,[{set_member,2}]},
           {inline,[{set_delete,2}]},
           {inline,[{set_union,2}]},
           {inline,[{set_is_subset,2}]},
           {inline,[{is_terminal,2}]}]},
 {version,"4.4.5"},
 {time,{2007,10,7,17,53,55}},
 {source,"/usr/lib/erlang/lib/parsetools-1.4.1.1/src/yecc.erl"}]
&lt;/pre&gt;
&lt;code&gt;yecc&lt;/code&gt; (and almost all other modules) can be native compiled, but I don't know how to do it. I'm totally messed up by debian packaging system and don't know what dpkg-buildpackage does. It's difficult to do it with installed version because many packages need some special compiling options at least &lt;code&gt;{i,"/usr/lib/erlang/lib/stdlib-1.14.5/include/"}&lt;/code&gt; for included &lt;code&gt;.hrl&lt;/code&gt; files and so.

I will be glad if anyone give me some advice how to make package with native compiled modules or how to recompile only modules from source package to make some workaround.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5950739624271007232-5841405227181200374?l=pichis-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pichis-blog.blogspot.com/feeds/5841405227181200374/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5950739624271007232&amp;postID=5841405227181200374' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5950739624271007232/posts/default/5841405227181200374'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5950739624271007232/posts/default/5841405227181200374'/><link rel='alternate' type='text/html' href='http://pichis-blog.blogspot.com/2007/10/erlang-base-hipe-dont-contain-native.html' title='erlang-base-hipe.deb don&apos;t contain native compiled modules'/><author><name>Pichi</name><uri>http://www.blogger.com/profile/12662180723203160349</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5950739624271007232.post-3141454820454505989</id><published>2007-10-06T13:40:00.000-07:00</published><updated>2007-10-06T14:32:39.841-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='erlang'/><category scheme='http://www.blogger.com/atom/ns#' term='multi-core'/><title type='text'>Binaries really faster than lists</title><content type='html'>&lt;a href="http://www.blogtrader.net/page/dcaoyuan"&gt; Caoyuan&lt;/a&gt; did his second &lt;a href="http://www.blogtrader.net/page/dcaoyuan?entry=tim_bray_s_erlang_exercise1" title="Tim Bray's Erlang Exercise on Large Dataset Processing - Round II"&gt;round&lt;/a&gt; of &lt;a href="http://www.tbray.org/ongoing/When/200x/2007/09/20/Wide-Finder"&gt;Wide Finder Project&lt;/a&gt; using lists, but expected lists traversing is &lt;a href="http://blogtrader.net/page/dcaoyuan?entry=tim_bray_s_erlang_exercise"&gt;faster&lt;/a&gt; than binary traversing. But he made little mistake. He assumed, that &lt;code&gt;timer:tc(tbray, travel_list, [binary_to_list(Bin)])&lt;/code&gt; including &lt;code&gt;binary_to_list(Bin)&lt;/code&gt; time, but it is not true. &lt;code&gt;binary_to_list(Bin)&lt;/code&gt; is done before &lt;code&gt;timer:tc&lt;/code&gt; call and result passed as parameter. But there are more performance problems in his solution. He often use &lt;code&gt;lists:reverse&lt;/code&gt; and &lt;code&gt;++&lt;/code&gt; which twice reverse first parameter too. Bigger memory usage of list and often reversing must cause bad performance. When I tried write same algorithm using binary I take 3 times speed up.
&lt;pre&gt;&lt;code&gt;
-module(tbray2).

-compile([native, {hipe,[o3]}]).

-export([start/1, start/2]).

-record(context, {main,
                 dict,
                 chunkNum,
                 processedNum = 0}).

start(FileName) -&amp;gt; start(FileName, 1024*32).
start(FileName, ChunkSize) -&amp;gt;
   Start = now(),
   Main = self(),
   Collector = spawn_link(fun () -&amp;gt; collect_loop(#context{main = Main,
                                                     dict = dict:new(),
                                                     processedNum = 0}) end),
   ChunkNum = foreach_chunk(
           fun(Chunk) -&amp;gt;
               spawn_link(fun() -&amp;gt; Collector ! scan_lines(Chunk) end)
           end,
           FileName,
           ChunkSize
       ),
   Collector ! {chunkNum, ChunkNum},
  
   %% don't terminate, wait here, until all tasks done.
   receive
       stop -&amp;gt; io:format("Time: ~p ms~n", [timer:now_diff(now(), Start) / 1000])
   end.

foreach_chunk(Fun, FileName, SizeOfChunk) -&amp;gt;
   {ok, File} = file:open(FileName, [raw, binary]),
   N = foreach_chunk(Fun, File, &amp;lt;&amp;lt;&amp;gt;&amp;gt;, SizeOfChunk, 0),
   file:close(File),
   N.

foreach_chunk(Fun, File, PrevRest, SizeOfChunk, N) -&amp;gt;
   {Chunk, Rest} = read_chunk(File, PrevRest, SizeOfChunk),
   Fun(Chunk),
   case Rest of
       &amp;lt;&amp;lt;&amp;gt;&amp;gt; -&amp;gt; N+1;
       _ -&amp;gt; foreach_chunk(Fun, File, Rest, SizeOfChunk, N+1)
   end.

read_chunk(File, PrevRest, N) -&amp;gt;
   case file:read(File, N) of
       {ok, B} -&amp;gt;
           {Line, Rest} = split_on_nl(B),
           Chunk = &amp;lt;&amp;lt;PrevRest/binary, Line/binary&amp;gt;&amp;gt;,
           case Rest of
               &amp;lt;&amp;lt;&amp;gt;&amp;gt; -&amp;gt;
                   read_chunk(File, Chunk, N);
               _ -&amp;gt;
                   {Chunk, Rest}
           end;
       eof -&amp;gt;
           {PrevRest, &amp;lt;&amp;lt;&amp;gt;&amp;gt;}
   end.

split_on_nl(B) -&amp;gt; split_on_nl(B, 0, size(B)).

split_on_nl(B, N, S) when N &amp;lt; S -&amp;gt; case B of
       &amp;lt;&amp;lt;Line:N/binary, $\n, Tail/binary&amp;gt;&amp;gt; -&amp;gt; {Line, Tail};
       _ -&amp;gt; split_on_nl(B, N+1, S)
   end;
split_on_nl(B, _, _) -&amp;gt; {B, &amp;lt;&amp;lt;&amp;gt;&amp;gt;}.

collect_loop(#context{main=Main,
                     dict=Dict,
                     chunkNum=ChunkNum,
                     processedNum=ProcessedNum}=Context) -&amp;gt;
   case ProcessedNum of
       ChunkNum -&amp;gt;
           print_result(Dict),
           Main ! stop;
       _ -&amp;gt;
           receive
               {chunkNum, N} -&amp;gt; collect_loop(Context#context{chunkNum = N});
               DictX -&amp;gt;
                   collect_loop(Context#context{
                       dict = dict:merge(fun (_, V1, V2) -&amp;gt; V1 + V2 end, Dict, DictX),
                       processedNum = ProcessedNum+1})
           end
   end.

print_result(Dict) -&amp;gt;
   [R1, R2, R3, R4, R5, R6, R7, R8, R9, R10 | _] =
       lists:reverse(lists:keysort(2, dict:to_list(Dict))),
   lists:foreach(fun ({Word, Count}) -&amp;gt;
                         io:format("~p get requests for ~s~n", [Count, Word])
                 end, [R1, R2, R3, R4, R5, R6, R7, R8, R9, R10]).

scan_lines(Bin) -&amp;gt; scan_lines(Bin, dict:new()).
scan_lines(Bin, Dict) -&amp;gt;
   case naive_search(Bin) of
       {ok, Key, Rest} -&amp;gt;
           scan_lines(
               element(2, split_on_nl(Rest)),
               dict:update_counter(Key, 1, Dict));
       false -&amp;gt; Dict
   end.

%% naive_search(Binary()) -&amp;gt; false | {ok, Key, Rest}
naive_search(B) -&amp;gt; naive_search(B, 0, size(B)-18).

naive_search(B, N, S) when N &amp;lt; S -&amp;gt;
   case B of
       &amp;lt;&amp;lt;_:N/binary, "GET /ongoing/When/", Rest/binary&amp;gt;&amp;gt; -&amp;gt;
           case keyMatch(Rest) of
               Result = {ok, _Key, _Rest2} -&amp;gt; Result;
               false -&amp;gt; naive_search(Rest)
           end;
       _ -&amp;gt; naive_search(B, N+1, S)
   end;
naive_search(_, _, _) -&amp;gt; false.

%% keyMatch(Binary()) -&amp;gt; false | {ok, Key, Rest}
keyMatch(&amp;lt;&amp;lt;C, _/binary&amp;gt;&amp;gt;) when C == $ ; C == $. -&amp;gt; false;   % empty
keyMatch(B) -&amp;gt; keyMatch(B, 1, size(B)).

keyMatch(B, N, S) when N&amp;lt;S -&amp;gt;
   case B of
       % end with space
       &amp;lt;&amp;lt;Key:N/binary, $ , Rest/binary&amp;gt;&amp;gt; -&amp;gt; {ok, Key, Rest};
       &amp;lt;&amp;lt;_:N/binary, $., _/binary&amp;gt;&amp;gt; -&amp;gt; false;
       _ -&amp;gt; keyMatch(B, N+1, S)
   end;
keyMatch(_, _, _) -&amp;gt; false.&lt;/code&gt;&lt;/pre&gt;Result is less memory consuming  and faster program. Problem is partitioned same way as  Caoyuan did and I suppose that scale same way, but I can't test it because I don't have any multi core computer.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5950739624271007232-3141454820454505989?l=pichis-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pichis-blog.blogspot.com/feeds/3141454820454505989/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5950739624271007232&amp;postID=3141454820454505989' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5950739624271007232/posts/default/3141454820454505989'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5950739624271007232/posts/default/3141454820454505989'/><link rel='alternate' type='text/html' href='http://pichis-blog.blogspot.com/2007/10/binaries-realy-faster-than-lists.html' title='Binaries really faster than lists'/><author><name>Pichi</name><uri>http://www.blogger.com/profile/12662180723203160349</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5950739624271007232.post-5975311151713005410</id><published>2007-10-03T07:34:00.001-07:00</published><updated>2007-10-03T08:08:01.646-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='erlang'/><title type='text'>Keep your mailbox empty</title><content type='html'>Matthias wrote &lt;a href="http://www.lshift.net/blog/2007/10/01/too-much-mail-is-bad-for-you"&gt;Too much mail is bad for you&lt;/a&gt;. Yes, it's true. He solved problem by synchronous communication between producer and consumer, but this solution blocks producer and is not "&lt;a href="http://www.lshift.net/blog/2007/10/01/too-much-mail-is-bad-for-you#comments"&gt;appropriate solution in general&lt;/a&gt;" (second comment). I think, one good solution is make your own queue manager and keep process mailbox empty. It's principally solution what recommends Vlad in that discussion. And how to do it?&lt;pre class="code"&gt;
-module(mqueue).

-export([timed_run/3]).

producer(0, ConsumerPid, _WithAck) -&gt;
  ConsumerPid ! {done, self()},
  receive
      done -&gt; ok
  end;
producer(N, ConsumerPid, WithAck = false) -&gt;
  ConsumerPid ! {msg, self()},
  producer(N-1, ConsumerPid, WithAck);
producer(N, ConsumerPid, WithAck = true) -&gt;
  ConsumerPid ! {acked_msg, self()},
  receive
      ack -&gt; ok
  end,
  producer(N-1, ConsumerPid, WithAck).

consumer(M, EchoPid) -&gt;
  receive
      {msg, _From} -&gt;
          call_echo(M, EchoPid),
          consumer(M, EchoPid);
      {acked_msg, From} -&gt;
          From ! ack,
          call_echo(M, EchoPid),
          consumer(M, EchoPid);
      {done, From} -&gt;
          EchoPid ! done,
          From ! done
  end.

queue_keeper_empty(ConsumerPid) -&gt;
  receive
      {msg, _From} -&gt;
          ConsumerPid ! {acked_msg, self()},
          queue_keeper_waiting(queue:new(), ConsumerPid);
      {done, From} -&gt;
          ConsumerPid ! {done, self()},
          receive
              done -&gt;
                  From ! done
          end
  end.

queue_keeper_waiting(Q, ConsumerPid) -&gt;
  receive
      {msg, _From} -&gt;
          queue_keeper_waiting(
              queue:in({acked_msg, self()}, Q),
              ConsumerPid
          );
      ack -&gt; case queue:out(Q) of
              {{value, Msg}, Q1} -&gt;
                  ConsumerPid ! Msg,
                  queue_keeper_waiting(Q1, ConsumerPid);
              {empty, Q} -&gt; queue_keeper_empty(ConsumerPid)
          end
  end.

call_echo(0, _EchoPid) -&gt;
  ok;
call_echo(M, EchoPid) -&gt;
  EchoPid ! {hello, self()},
  receive
      hello -&gt; call_echo(M-1, EchoPid)
  end.

echo() -&gt;
  receive
      {Msg, From} -&gt;
          From ! Msg,
          echo();
      done -&gt; ok
  end.

run(N, M, WithAck) -&gt;
  EchoPid     = spawn_link(fun echo/0),
  ConsumerPid = spawn_link(
                  fun () -&gt;
                          consumer(M, EchoPid) end),
  case WithAck of
      true -&gt;
          producer(N, ConsumerPid, WithAck);
      false -&gt;
          producer(N, spawn_link(
                  fun() -&gt;
                      queue_keeper_empty(ConsumerPid)
                  end
              ), WithAck)
  end.

time(F) -&gt;
  Start = erlang:now(),
  F(),
  timer:now_diff(erlang:now(), Start).

timed_run(N, M, WithAck) -&gt;
  time(fun() -&gt; run(N, M, WithAck) end).
&lt;/pre&gt;
This solution is only about 25% slower than synchronous, but producer is not blocked.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5950739624271007232-5975311151713005410?l=pichis-blog.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pichis-blog.blogspot.com/feeds/5975311151713005410/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5950739624271007232&amp;postID=5975311151713005410' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5950739624271007232/posts/default/5975311151713005410'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5950739624271007232/posts/default/5975311151713005410'/><link rel='alternate' type='text/html' href='http://pichis-blog.blogspot.com/2007/10/keep-your-mailbox-empty.html' title='Keep your mailbox empty'/><author><name>Pichi</name><uri>http://www.blogger.com/profile/12662180723203160349</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
