riak, part 2

  1. SecondaryIndexes (aka 2i)
  2. RiakSearch
  3. RiakLinks

SecondaryIndexes (aka 2i)

at write time you tag an object with queryable values: index attribute and index value (via metadata):
  Header: 'x-riak-index-index_attrib:index_value'
at read time you fetch objects with propriate indexes using:
  GET /buckets/bucket_name/index/index_attrib/index_value
riak 2i currently requires the LevelDB backend, so in config file app.conf must be:
 {riak_kv, [
    ...
    {storage_backend, riak_kv_eleveldb_backend},
    ...

how it works

SecondaryIndexes are a list of key/value pairs that are similar to http headers. at write time, objects are tagged with index entries consisting of key/value metadata

an application can modify the indexes for an object by reading an object, adding or removing index entries, and then writing the object. the object's value and its indexes should be thought of as a single unit. there is no way to alter the indexes of an object independently from the value of an object, and vice versa

indexing is atomic, and is updated in real-time when writing an object. this means that an object will be present in future index queries as soon as the write operation completes.

features

  • allows two types of secondary attributes:
    1. integers
    2. strings
  • allows querying by:
    1. exact match
    2. range (on one index)
  • query results can be used as input to a MapReduce query
  • when to use

  • to find data based on terms other than an objects’ bucket/key pair
  • the value being stored is an opaque blob (like a binary file)
  • when not to use

  • in composite queries
  • if pagination is a requirement
  • if ordered result sets is a requirement
  • inserting

      $>curl http://localhost:8098/buckets/users/keys/john \
      -H 'x-riak-index-dep_bin: 12' \
      -H 'x-riak-index-rol_bin: worker' \
      -d '{"field1": 1, "field2": "one", "field3": true}'
      $>curl http://localhost:8098/buckets/users/keys/mary \
      -H 'x-riak-index-dep_bin: 34' \
      -H 'x-riak-index-rol_bin: worker' \
      -d '{"field1": 2, "field2": "two", "field3": false}'
      $>curl http://localhost:8098/buckets/users/keys/tom \
      -H 'x-riak-index-dep_bin: 12' \
      -H 'x-riak-index-rol_bin: super' \
      -d '{"field1": 3, "field2": "three", "field3": true}'
    
    bucket/keyusers/john
    users/mary
    users/tom
    2i “dep_bin”12
    34
    2i “rol_bin”"worker"
    "super"

    querying

      $> curl http://localhost:8098/buckets/users/index/dep_bin/12
      {"keys":["john","tom"]}
      $> curl http://localhost:8098/buckets/users/index/rol_bin/worker
      {"keys":["john","mary"]}
      $> curl http://localhost:8098/buckets/users/index/dep_bin/34
      {"keys":["mary"]}
      $> curl http://localhost:8098/buckets/users/index/rol_bin/super
      {"keys":["tom"]}
    

    example from here

    #!/bin/bash
    
      Bucket="bar"; Host="localhost"; Port="8098"
    
      register ()
      {
        curl -s http://$Host:$Port/riak/$Bucket -H 'content-type: text/plain' \
        -H "x-riak-index-name_bin: $1" -d "$1"
      }
    
      store ()
      {
        curl -s http://$Host:$Port/buckets/$Bucket/keys -d "$2" \
        -H "x-riak-index-dev_bin: $1" -H 'content-type: application/json' 
      }
    
      populate ()
      {
        register "dev1"; register "dev2"; register "dev3";
    
        store "dev1" '{"id":"d1", "ts":1, "dat":"d_1-1"}'
        store "dev1" '{"id":"d1", "ts":2, "dat":"d_1-2"}'
        store "dev1" '{"id":"d1", "ts":3, "dat":"d_1-3"}'
    
        store "dev2" '{"id":"d2", "ts":1, "dat":"d_2-1"}'
        store "dev2" '{"id":"d2", "ts":2, "dat":"d_2-2"}'
        store "dev2" '{"id":"d2", "ts":3, "dat":"d_2-3"}'
    
        store "dev3" '{"id":"d3", "ts":1, "dat":"d_3-1"}'
        store "dev3" '{"id":"d3", "ts":2, "dat":"d_3-2"}'
        store "dev3" '{"id":"d3", "ts":3, "dat":"d_3-3"}'
    
        echo populated
      }
    
      fetch_list ()
      {
        Data=$(cat <<EOF
        {"inputs":{"bucket":"$Bucket", "index":"name_bin", "start":"a", "end":"z"},
         "query":[{"map": {"language":"erlang",
                           "module":"riak_kv_mapreduce",
                           "function":"map_object_value"}}]}
      EOF
      )
        echo -n $Data | curl -s http://$Host:$Port/mapred \
        -H 'content-type: application/json' --data-binary @-
        echo
      }  
    
      query_data ()
      {
        Data=$(cat <<EOF
        { "inputs":{ "bucket":"$Bucket", "index":"dev_bin", "key":"$1"},
          "query":[{"map":{"language":"javascript", 
                           "arg":{"ts_min":"$2", "ts_max":"$3"},
                           "source":"function (v, kd, arg)
                                     {
                                       var ts_min = arg.ts_min;
                                       var ts_max = arg.ts_max;
                                       var data = Riak.mapValuesJson(v)[0];
                                       if (ts_min<=data.ts && data.ts<=ts_max) {return [data];}
                                       else {return [];}
                                      }" }}]}
      EOF
      )
        echo -n $Data | curl -s http://$Host:$Port/mapred \
        -H 'content-type: application/json' --data-binary @-
        echo
      } 
    
      help ()
      {
        echo "           PRINT HELP: $0 -h"
        echo " POPULATE SAMPLE DATA: $0 -p"
        echo "    FETCH DEVICE LIST: $0 -l"
        echo "QUERY DATA FOR DEVICE: $0 -q DEV_ID TS_START TS_END"
      }
    
      f=$1; shift
    
      case $f in
        "" | "-h") help; exit 1 ;;
        "-p"     ) populate ;;
        "-l"     ) fetch_list ;;
        "-q"     ) query_data $@
      esac
    
      $> ./devices.sh -l
      []
      $> ./devices.sh -p
      populated
      $> ./devices.sh -l
      ["dev1","dev2","dev3"]
      $> ./devices.sh -q dev2 1 2
      [{"id":"d2","ts":1,"dat":"d_2-1"},{"id":"d2","ts":2,"dat":"d_2-2"}]
      $> ./devices.sh -q dev3 2 3
      [{"id":"d3","ts":3,"dat":"d_3-3"},{"id":"d3","ts":2,"dat":"d_3-2"}]
      $> ./devices.sh -q dev1 1 3
      [{"id":"d1","ts":2,"dat":"d_1-2"},{"id":"d1","ts":3,"dat":"d_1-3"},{"id":"d1","ts":1,"dat":"d_1-1"}]
    

    RiakSearch, part1

    RiakSearch is a distributed, full-text search engine

    when to use

  • when fast information retrieval is needed
  • when the data is easily analyzed (JSON, txt, XML)

    when not to use

  • when only simple tagging of data is needed
  • when the data is not easily analyzed (mp3, mp4 or some other binary format)

    how it works

    when search is enabled on your cluster, another set of vnodes, equal to the number of KV vnodes, will be started which will be used to handle search requests

    search is enabled on a per-bucket basis by setting the bucket search property to true

    indexes are generated on all objects as they are written using the pre-commit hook. index data is replicated across the cluster

    RiakSearch uses timestamps to resolve version conflicts, not vector clocks

    RiakSearch does not use quorum values when writing (indexing) data. only one copy of the data is read

    datatypes

    RiakSearch is able to handle several standard data encodings with zero configuration. simply set the Content-Type metadata on your objects to the appropriate mime-type (using HTTP headers). out of the box XML, JSON and plain-text encodings are supported

    JSON encoded data

    if your data is in JSON format, set your Content-Type to:
  • application/json
  • application/x-javascript
  • text/javascript
  • text/x-javascript
  • text/x-json
  • it will cause RiakSearch to use the field names of the JSON object as index field names. nested objects will use underscore  _  as a field name separator

    
       {"name":"Alice", "role":"student",
        "favorites":{"book":"Ulysses", "album":"The Doors", "film":"Matrix"}}
     
    would cause four fields to be indexed: “name”, “role”, “favorites_book”, “favorites_album” and "favorites_film"

    you could later query this data with queries like, “bio:engineer AND favorites_album:Enigma”

    plain-text data

    if your data is plain text, set your Content-Type to text/plain. the plain-text decoder is also used if no Content-Type is found

    it will cause RiakSearch to index all of the text in the object's value under a single field, named “value”. queries can be explicit about searching this field, as in “value:seven AND value:score”, or omit the default field name, as in “seven AND score”

    erlang data

    if your object contains Erlang terms, you can set your Content-Type to application/x-erlang. this expects either an Erlang term that is a proplist or a nested proplist

    in the case of a proplist, the key is used as the field name, and the value as the field value. when the object's value is a nested proplist, field names are constructed by concatenating the nested keys together with underscores in between

    enable RiakSearch

    by default, RiakSearch is disabled
    1. edit each node’s app.config to enable the searching - change the value of enabled from false to true:
    2.  
        {riak_search, [{enabled, true}]},
       
      you cannot dedicate a node for RiakSearch. every node in cluster is required to participate. every node in cluster must have RiakSearch enabled
    3. by default, none of the buckets are indexed
    4. to enable indexing for a bucket, simply add the search precommit hook to that bucket's properties or, from command line:
      
        $RIAK_PATH/bin/search-cmd install foo
      
    5. define a schema file for the bucket
    6. the default schema treats all fields as strings, unless you suffix your field name as follows:
      FIELDNAME_numnum fieldInteger Analyzer
      FIELDNAME_int
      FIELDNAME_tddate fieldNo-Op analyzer
      FIELDNAME_date
      FIELDNAME_txtfull text fieldStandard Analyzer
      FIELDNAME_text
      all other fields   Whitespace Analyzer
      by default field is named "value"

      the schema definition for the bucket is stored in the riak bucket_rs_schema, with a key of the same name as the bucket, writing to this bucket is highly discouraged

      Schema-level Properties

      version
      required. a version number
      default_field
      required
      default_op
      optional. “and” | “or” (default)
      n_val
      optional. default = 3
      analyzer_factory
      optional

      field-Level Properties

      fields can either by static or dynamic

      a static field is denoted with field at the start of the field definition. field will perform an exact string match on a field name

      a dynamic field is denoted with dynamic_field. field will perform a wildcard match on the string name

      field matching occurs in the order of appearance in the schema definition. this allows you to create a number of static fields followed by a dynamic field as a “catch all” to match the rest

      name
      required. dynamic fields can use wildcards.
      note that the unique field identifying a document must be named “id”
      required
      optional. boolean flag indicating whether this field is required in an incoming document.
      if missing, then the document will fail validation
      defaults to false
      type
      optional. “string” (default) | “integer”
      if “integer” and no field-level analyzer_factory is defined, then Whitespace analyzer is used
      analyzer_factory
      optional. if not specified, defaults to the analyzer factory for the schema (unless the field is an integer type)
      skip
      optional. "true" | "false" (default)
      when “true”, the field is stored, but not indexed
      alias
      optional. you can add as many alias settings as you like
      padding_size
      optional. values are padded up to this size
      defaults to 0 for string types, 10 for integer types
      inline
      optional. “true” | “false” (default) | “only”
      when “only”, the field will not be searchable by itself but can be used as a “filter” for searches on other fields. field value is stored “inline” with the indexes for other fields
      when “true”, the field will be stored normally in addition to inline
    7. schema setting
    8. you can set/retrieve the bucket schema using command line tools:
       
        # to set schema
        $RIAK_DIR/bin/search-cmd set-schema BucketName SchemaFile
      
        # to view schema
        $RIAK_DIR/bin/search-cmd show-schema BucketName 
       
      note that changes to the SchemaFile will not affect previously indexed data

    example

    enable search for bucket:
      $> sudo /opt/riak/bin/search-cmd install users
       :: Installing Riak Search <--> KV hook on bucket 'users'.
      $> curl $host/riak/users
      {"props":{ ...  "precommit":[{"mod":"riak_search_kv_hook","fun":"precommit"}], ... }}
    
    put down a schema file for the bucket:
    
      %% schema for bucket 'users'
      { schema, [{version, "1.1"}, {n_val, 3}, {default_field, "name"},
                 {analyzer_factory, {erlang, text_analyzers, whitespace_analyzer_factory}}],
          %% field "zip" is indexed as integer
        [ {field, [{name, "address_zip"}, {type, integer},
            {analyzer_factory, {erlang, text_analyzers, integer_analyzer_factory}} ]},
          %% everything else is a string
          {dynamic_field, [ {name, "*"}, {type, string},
          {analyzer_factory, {erlang, text_analyzers, whitespace_analyzer_factory}} ]} ]}.
    
    set it with:
      $> sudo /opt/riak1/bin/search-cmd set-schema users /opt/erl/schema
       :: Updating schema for 'users'...
       :: Clearing schema caches...
       :: Done.
    
    input data:
      $> curl -XPUT http://192.168.2.2:8098/buckets/users/keys/1234 \
      > -H 'Content-type: application/json' \
      > -d '{"name": "Joe", "address": {"street": "123 Some Street", "zip": 12345}}'
    
    now:

    search, HTTP API

    RiakSearch supports a Solr-compatible interface for searching documents via HTTP.
        http://hostname:8098/solr/BUCKET_NAME/select
    
    the following parameters are supported:
    q=QUERY
    run the provided query
    df=FIELDNAME
    use the provided field as the default
    overrides the “default_field” setting in the schema file
    q.op=OPERATION
    "and" | "or" (default)
    overrides the “default_op” setting in the schema file
    start=N
    specify the starting result of the query
    default is 0
    rows=N
    specify the maximum number of results to return
    default is 10
    sort=FIELDNAME
    sort on the specified field name after the given rows are found
    default is “none”(sorted in descending order by score)
    wt=FORMAT
    choose the format of the output
    “json” | “xml” (default)
    with subfield:
      $> curl http://192.168.2.2:8098/solr/users/select?'q=address_zip:12345&wt=json'
      {"responseHeader": {
         "status":0,
         "QTime":3,
         "params": { "q":"address_zip:12345", "q.op":"or", "filter":"", "wt":"json"}},
         "response": {"numFound":1, "start":0, "maxScore":"0.353553",
                      "docs":[{"id":"1234", "index":"users",
                               "fields":{"address_street":"123 Some Street",
                                         "address_zip":12345, "name":"Joe"},
                               "props":{}}]}}
    
    with default field:
      $> curl http://192.168.2.2:8098/solr/users/select?'q=Joe&wt=json'
      {"responseHeader": {
          "status":0,
          "QTime":3,
          "params":   {"q":"Joe", "q.op":"or", "filter":"", "wt":"json"}},
          "response": {"numFound":1, "start":0, "maxScore":"0.353553",
              "docs": [{"id":"1234", "index":"users",
                    "fields":{"address_street":"123 Some Street",
                              "address_zip":12345, "name":"Joe"},
                    "props":{}}]}}
    

    search from command line

    subfield search:
      $> sudo /opt/riak1/bin/search-cmd search users "address_street:Some Street"
       :: Searching for 'address_street:Some Street' / '' in users..
       ------------------------------
       index/id: users/1234
       p -> [0]
       score -> 0.35355339059327373
       ------------------------------
      :: Found 1 results.
    
    and now, with default field:
      $> sudo /opt/riak1/bin/search-cmd search users "Joe"
       :: Searching for 'Joe' / '' in users...
       ------------------------------
       index/id: users/1234
       p -> [0]
       score -> 0.35355339059327373
       ------------------------------
      :: Found 1 results.
    

    search via the Erlang shell

    this will display a list of DocID values matching the query

    with default field searching:

      (riak2@127.0.0.1)3> search:search(<<"users">>, <<"Joe">>).     
      {1, [{<<"users">>,<<"1234">>, [{p,[0]},{score,0.35355339059327373}]}]}
    
    with the subfield:
      (riak2@127.0.0.1)5> search:search(<<"users">>, <<"address_zip:12345">>).  
      {1, [{<<"users">>,<<"1234">>, [{p,[0]},{score,0.35355339059327373}]}]}
    
    to conduct a document search, use:
      (riak1@127.0.0.1)7> search:search_doc(<<"users">>, <<"name:Joe">>).
      {1,"0.353553",
       [{riak_idx_doc,<<"users">>,<<"1234">>,
                       [{<<"address_street">>,<<"123 Some Street">>,
                       [{<<"123">>,[0]},{<<"Street">>,[2]},{<<"Some">>,[1]}]},
                       {<<"address_zip">>,<<"12345">>,[{<<"0000012345">>,[0]}]},
                       {<<"name">>,<<"Joe">>,[{<<"Joe">>,[0]}]}],
                       [],[],true}]}
    

    querying integrated with MapReduce

    kicking off a map/reduce query with the same result set over HTTP would use a POST body like this:
     
    { "inputs": { "bucket":"mybucket", "query":"foo OR bar" },
       "query": ...  }
     
    the phases in the “query” field should be exactly the same as usual. an initial map phase will be given each object matching the search for processing

    the query field specifies the search query. all syntax available in other Search interfaces is available in this query field

    is search enabling has any impact on RAM usage?

    yes, the index engine behind RiakSearch makes heavy usage of Erlang ETS tables. each partition has an in-memory buffer as well as an in-memory offset table for every segment. it also uses a temporary ETS table for every write to store posting data. the ETS system limit can even become an issue in overload scenarios

    is search enabling has any impact on read latency/throughput?

    if you are reading and searching at the same time there is a good chance it will. it will cause more disk seeks

    degradation of throughput after enabling RiakSearch is normal. RiakSearch does a lot of work during index time. it has to analyze the data and each indexed document has a good chance of causing writes to every node in the cluster because of term-based partitioning

    a single-term query with a reasonable result set typically has a latency of a disk seek plus a few milliseconds

    a range query can produce latency variance because it requires connecting to a covering set of vnodes

    indexing of already existed objects

    as an alternative to re-writing the objects to cause them to be indexed, you may invoke a map operation with side-effects

    you define an Erlang map-phase function as follows:

    
      map_reindex({error,notfound}, _, _) ->
        [];
      map_reindex(RiakObject, _, _) ->
        riak_search_kv_hook:precommit(RiakObject),
        [].
    
    
    you want to run that against all of the keys in the bucket by posting a mapred job like this:
    
      { "inputs": "",
        "query": [
          { "map": {
              "language" : "erlang",
              "module"   : "<your-module>",
              "function" : "map_reindex"
          }}],
        "timeout": <your-timeout> }
    
    

    cashing

    virtually everything in riak is transient. meaning, for the most part, there is no caching going on in riak outside of the operating system. i.e. your subsequent queries will do more or less the same work as their predecessors

    RiakLinks

    every datum stored in riak can have one-way relationships to other data via the Link HTTP header

    RiakLinks are metadata that establish this relationships between objects. once attached, Riak Links they are enable you to run queries that "walk" from one object to another. with them, you create lightweight pointers between your data

    RiakLinks live in the metadata of an object and are attached to it via the "Link:" HTTP header

    example 1

    input some data about four people:
      $> curl -H 'content-Type: text/plain' $host/riak/artists/TheBeatles \
      -d " John Lennon, Paul McCartney, George Harrison, Ringo Starr " \
      -H 'Link: </riak/albums/AHardDaysNight>; riaktag="disc"' \
      -H 'Link: </riak/albums/AbbeyRoad>; riaktag="disc"'
    
      $> curl -H 'content-Type: text/plain' $host/riak/albums/AHardDaysNight \
      -d "Title : A Hard Day's Night; 
      Released : 26 June 1964;
      Label : Abbey Road Studios;" \
      -H 'Link: </riak/artists/TheBeatles>; riaktag="band"' \
      -H 'Link: </riak/tracks/AHardDaysNight>; riaktag="lyrics"' \
      -H 'Link: </riak/tracks/IShouldHaveKnownBetter>; riaktag="lyrics"'
    
      $> curl -H 'content-Type: text/plain' $host/riak/tracks/IShouldHaveKnownBetter \
      -d "I should have known better with a girl like you.
      That I would love ev'rything that do,
      and I do hey, hey, hey, and I do." \
      -H 'Link: </riak/artists/TheBeatles>; riaktag="band"' \
      -H 'Link: </riak/albums/AHardDaysNight>; riaktag="disc"'
    
      $> curl -H 'content-Type: text/plain' $host/riak/tracks/AHardDaysNight \
      -d "It's been a hard day's night,
      And I've been working like a dog.
      It's been a hard day's night,
      I should be sleeping like a log." \
      -H 'Link: </riak/artists/TheBeatles>; riaktag="band"' \
      -H 'Link: </riak/albums/AHardDaysNight>; riaktag="disc"' 
    
      $> curl -H 'content-Type: text/plain' $host/riak/albums/AbbeyRoad \
      -d "Title : AbbeyRoad;
      Released : 26th September, 1969; 
      Label : Abbey Road Studios;" \
      -H 'Link: </riak/artists/TheBeatles>; riaktag="band"' \
      -H 'Link: </riak/tracks/ComeTogether>; riaktag="lyrics"' \
      -H 'Link: </riak/tracks/Something>; riaktag="lyrics"'  
    
      $> curl -H 'content-Type: text/plain' $host/riak/tracks/ComeTogether \
      -d "Here come old flat top.
      He come grooving up slowly.
      He got joo joo eyeball.
      He one holy roller." \
      -H 'Link: </riak/artists/TheBeatles>; riaktag="band"' \
      -H 'Link: </riak/albums/AbbeyRoad>; riaktag="disc"'
    
      $> curl -H 'content-Type: text/plain' $host/riak/tracks/Something \
      -d "Something in the way she moves
      Attracts me like no other lover,
      Something in the way she woos me.
      I don't want to leave her now,
      You know I believe and how." \
      -H 'Link: </riak/artists/TheBeatles>; riaktag="band"' \
      -H 'Link: </riak/albums/AbbeyRoad>; riaktag="disc"'
    
    to retrieve RiakLinks use curl -v

    once you have tagged objects with RiakLinks, you can then traverse them with an operation called "Link Walking"

      $ curl $host/riak/artists/TheBeatles/albums,disc,1
    
    at the end of that request /albums,disc,1 is tacked. that is the link specification. it's always composed of three parts:
    Bucket
    a bucket name to limit the links
    Tag
    the riaktag to limit the links
    Keep
    0 | 1 : whether to return results from this phase
    the results of previous query would be:
    
      --CrvK6CKepNYbKOZvrObYr1vp8XA
      Content-Type: multipart/mixed; boundary=MeuW8X9p15MNYgMuktHxNg2sc75
    
      --MeuW8X9p15MNYgMuktHxNg2sc75
      X-Riak-Vclock: a85hYGBgzGDKBVIcBy/FcwayKQllMCWy5LEy+Ff0nuLLAgA=
      Location: /riak/albums/AbbeyRoad
      Content-Type: text/plain
      Link: </riak/tracks/Something>; riaktag="lyrics", 
            </riak/tracks/ComeTogether>; riaktag="lyrics", 
            </riak/artists/TheBeatles>; riaktag="band",
            </riak/albums>; rel="up"
      Etag: 1NogZjx2aazLWstYV3A6ZK
      Last-Modified: Mon, 11 Feb 2013 14:12:31 GMT
    
    
      Title : AbbeyRoad;
      Released : 26th September, 1969;
      Label : Abbey Road Studios;
      --MeuW8X9p15MNYgMuktHxNg2sc75
      X-Riak-Vclock: a85hYGBgzGDKBVIcBy/FcwayKQllMCWy5LEy+Ff0nuLLAgA=
      Location: /riak/albums/AHardDaysNight
      Content-Type: text/plain
      Link: </riak/tracks/IShouldHaveKnownBetter>; riaktag="lyrics",
            </riak/tracks/AHardDaysNight>; riaktag="lyrics",
            </riak/artists/TheBeatles>; riaktag="band",
            </riak/albums>; rel="up"
      Etag: 1f8jQ7SzMqFNaeHTUqn0xc
      Last-Modified: Mon, 11 Feb 2013 14:12:31 GMT
    
    
      Title : A Hard Day's Night;
      Released : 26 June 1964;
      Label : Abbey Road Studios;
      --MeuW8X9p15MNYgMuktHxNg2sc75--
    
      --CrvK6CKepNYbKOZvrObYr1vp8XA--
    
    you can replace both the Bucket and the Tag fields in the link spec with an underscore. this will tell the query to match any bucket or tag name

    you can walk any number of links with one request, and you can choose to have all the objects matching a single step returned with the end result

    each step you walk is referred to as a phase, because under the hood a link walking request uses the same mechanism as MapReduce, where every step specified in the URL is translated into a single MapReduce phase. if you want to walk multiple steps you can use the Keep parameter to specify which steps your particularly interested in. by default, riak will only include the objects found by the last step

    to remove a link from an object: read (GET) the object, remove the link information, and write it back into riak

    RiakLinks vs SQL JOIN

    RiakLinks much less painful than a JOIN in SQL because each item is operated on independently, rather than a table at a time

    here’s what that query would look like:

     
      GET /riak/artists/TheBeatles/albums,_,_/tracks,_,1
     
    that feels more natural than this:
     
      SELECT tracks.* FROM tracks
        INNER JOIN albums ON tracks.album_id = albums.id
          INNER JOIN artists ON albums.artist_id = artists.id
            WHERE artists.name = "TheBeatles"
     
    the result of the previous query would be:
    
      --KWAvCldPrQTMWAxVhTPjK8cHp8L
      Content-Type: multipart/mixed; boundary=LztNsc7zo2dB2FJLElYf5L4f372
    
      --LztNsc7zo2dB2FJLElYf5L4f372
      X-Riak-Vclock: a85hYGBgzGDKBVIcypz/fgb8/s2ewZTImsfK8LWj9xRfFgA=
      Location: /riak/tracks/Something
      Content-Type: text/plain
      Link: </riak/albums/AbbeyRoad>; riaktag="disc",
            </riak/artists/TheBeatles>; riaktag="band",
            </riak/tracks>; rel="up"
      Etag: 4ZBL3Ku8iGzmE9QhjJceKs
      Last-Modified: Mon, 11 Feb 2013 15:23:33 GMT
    
    
      Something in the way she moves
      Attracts me like no other lover,
      Something in the way she woos me.
      I don't want to leave her now,
      You know I believe and how.
      --LztNsc7zo2dB2FJLElYf5L4f372
      X-Riak-Vclock: a85hYGBgzGDKBVIcBy/FcwayKQllMCWy5rEyfO3oPcWXBQA=
      Location: /riak/tracks/AHardDaysNight
      Content-Type: text/plain
      Link: </riak/albums/AHardDaysNight>; riaktag="disc",
            </riak/artists/TheBeatles>; riaktag="band",
            </riak/tracks>; rel="up"
      Etag: 6wgu7BhB49j2Yd3SyaKEwo
      Last-Modified: Mon, 11 Feb 2013 15:23:33 GMT
    
    
      It's been a hard day's night,
      And I've been working like a dog.
      It's been a hard day's night,
      I should be sleeping like a log.  
      --LztNsc7zo2dB2FJLElYf5L4f372
      X-Riak-Vclock: a85hYGBgzGDKBVIcBy/FcwayKQllMCWy5rEyfO3oPcWXBQA=
      Location: /riak/tracks/IShouldHaveKnownBetter
      Content-Type: text/plain
      Link: </riak/albums/AHardDaysNight>; riaktag="disc",
            </riak/artists/TheBeatles>; riaktag="band",
            </riak/tracks>; rel="up"
      Etag: 7VXunYBxxl67iBnEXKW6Vj
      Last-Modified: Mon, 11 Feb 2013 15:23:33 GMT
    
    
      I should have known better with a girl like you.
      That I would love ev'rything that do,
      and I do hey, hey, hey, and I do.
      --LztNsc7zo2dB2FJLElYf5L4f372
      X-Riak-Vclock: a85hYGBgzGDKBVIcBy/FcwayKQllMCWy5rEyfO3oPcWXBQA=
      Location: /riak/tracks/ComeTogether
      Content-Type: text/plain
      Link: </riak/albums/AbbeyRoad>; riaktag="disc",
            </riak/artists/TheBeatles>; riaktag="band",
            </riak/tracks>; rel="up"
      Etag: 38jXKojjbydhvP45ookcQp
      Last-Modified: Mon, 11 Feb 2013 15:23:33 GMT
    
    
      Here come old flat top.
      He come grooving up slowly.
      He got joo joo eyeball.
      He one holy roller.
      --LztNsc7zo2dB2FJLElYf5L4f372--
    
    --KWAvCldPrQTMWAxVhTPjK8cHp8L--
    

    example 2

    taken from Sean Cribbs

    suppose that you have two kinds of relationships between people in some group: supervising > and friendship =

    lets load it to riak by the script file load_people.sh:

      #!/bin/bash
    
      <<COMMENTS
           Ava   >   Bob
           Ava   >   Peppy
           Bob   >   Sam
           Bob   >   Tom
           Ada   >   Mary
           Cat   >   Rem
           Bob   =   Peppy
           Cat   =   Tom   
           Mary  =   Tom
           Rem   =   Sam   
           Sam   =   Tom   
      COMMENTS
    
      Host="http://192.168.2.2:8098"
    
      curl -d 'Ada' -X PUT $Host/riak/people/ada -H 'content-type: text/plain' \
      -H 'Link: </riak/people/mary>; riaktag="reporter"'
      curl -d 'Ava' -X PUT $Host/riak/people/ava -H 'content-type: text/plain' \
      -H 'Link: </riak/people/bob>; riaktag="reporter", \
                </riak/people/peppy>; riaktag="reporter"'
      curl -d 'Bob' -X PUT $Host/riak/people/bob -H 'content-type: text/plain' \
      -H 'Link: </riak/people/sam>; riaktag="reporter", \
                </riak/people/tom>; riaktag="reporter", \
                </riak/people/peppy>; riaktag="friend"'
      curl -d 'Cat' -X PUT $Host/riak/people/cat -H 'content-type: text/plain' \
      -H 'Link: </riak/people/rem>; riaktag="reporter",  \
                </riak/people/tom>; riaktag="friend"'
      curl -d 'Mary' -X PUT $Host/riak/people/mary -H 'content-type: text/plain' \
      -H 'Link: </riak/people/tom>; riaktag="friend"'
      curl -d 'Peppy' -X PUT $Host/riak/people/peppy -H 'content-type: text/plain' \
      -H 'Link: </riak/people/bob>; riaktag="friend"'
      curl -d 'Rem' -X PUT $Host/riak/people/rem -H 'content-type: text/plain' \
      -H 'Link: </riak/people/sam>; riaktag="friend"'
      curl -d 'Sam' -X PUT $Host/riak/people/sam -H 'content-type: text/plain' \
      -H 'Link: </riak/people/tom>; riaktag="friend", \
                </riak/people/rem>; riaktag="friend"'
      curl -d 'Tom' -X PUT $Host/riak/people/tom -H 'content-type: text/plain' \
      -H 'Link: </riak/people/sam>; riaktag="friend", \
                </riak/people/mary>; riaktag="friend", \
                </riak/people/cat>; riaktag="friend"'
      echo done
    
    working horse people_queries.sh:
      #!/bin/bash
    
      names='ada|ava|bob|cat|mary|peppy|rem|sam|tom'
      host='http://192.168.2.2:8098'
    
      myprint ()
      {
        awk -e \
          'BEGIN                                 { print "\nfrom riak : "; }
           NF == 1 && $1 !~ /--/ && $1 !~ /^\r$/ { print "\t" $1; x++; }
           END                                   { if (x == 0) print "\tnot found"
                                                   print "finished\n\n"; }'
      }
    
      echo -n 'friends of ' $names ' ... '; read x; 
      curl -s $host/riak/people/$x/_,friend,1 | myprint
      echo -n 'direct reporters of ' $names ' ... '; read x; 
      curl -s $host/riak/people/$x/_,reporter,1 | myprint
      echo -n 'friends of direct reporters of ' $names ' ... '; read x
      curl -s $host/riak/people/$x/_,reporter,0/_,friend,1 | myprint
      echo -n 'direct and indirect reporters of ' $names ' ... '; read x
      curl -s $host/riak/people/$x/_,reporter,1/_,reporter,1 | myprint
      echo -n 'friends of indirect reporters of ' $names ' ... '; read x
      curl -s $host/riak/people/$x/_,reporter,0/_,reporter,0/_,friend,1 | myprint
      echo -n 'friends of friends of direct reporters of ' $names ' ... '; read x
      curl -s $host/riak/people/$x/_,reporter,0/_,friend,0/_,friend,1 | myprint
      echo -n 'people who report to friends of ' $names ' ... '; read x
      curl -s $host/riak/people/$x/_,friend,0/_,reporter,1 | myprint
    
    now:
      $> ./load_people.sh 
      done
      $> ./people_queries.sh 
      friends of  ada|ava|bob|cat|mary|peppy|rem|sam|tom  ... peppy
      from riak : 
              Bob
      finished
      direct reporters of  ada|ava|bob|cat|mary|peppy|rem|sam|tom  ... ava
      from riak : 
              Bob
              Peppy
      finished
      friends of direct reporters of  ada|ava|bob|cat|mary|peppy|rem|sam|tom  ... cat
      from riak : 
              Sam
      finished
      direct and indirect reporters of  ada|ava|bob|cat|mary|peppy|rem|sam|tom  ... mary
      from riak : 
              not found
      finished
      friends of indirect reporters of  ada|ava|bob|cat|mary|peppy|rem|sam|tom  ... ada
      from riak : 
              not found
      finished
      friends of friends of direct reporters of  ada|ava|bob|cat|mary|peppy|rem|sam|tom  ... bob
      from riak : 
              Mary
              Cat
              Sam
              Rem
              Tom
      finished
      people who report to friends of  ada|ava|bob|cat|mary|peppy|rem|sam|tom  ... rem
      from riak : 
              not found
      finished
    

    good style, but don't do a linked list using RiakLinks unless it is a small list