Driving force behind NoSQL:
Revisit document model and relational model history :
Arguments are :
Document : better performance due to locality, schema flexibility, and closer to application object structure.
Relational : better support for joins and many to many and many to one relationships.
Basically for the declarative languages, there is a explainer or something that help to explain the intrusions or languages, compared to the imperative languages who has to define the order or the instructions by coder themselves.
Example in a browser : XSL or CSS selector -> declarative
If you want selected page to have a blue background, using CSS
css
li.selected > p {
background-color: blue;
}
xsl
// XPath expression
<xsl:template match="li[@class='selected']/p">
<fo:block background-color="blue">
<xsl:apply-templates/>
</fo:block>
</xsl:template>
Using core DOM API, code would be like (imperative):
javascript
var liElements = document.getElementsByTagName("li");
for (var i = 0; i < liElements.length; i++) {
if (liElements[i].className === "selected") {
var children = liElements[i].childNodes;
for (var j = 0; j < children.length; j++) {
var child = children[j];
if (child.nodeType === Node.ELEMENT_NODE && child.tagName === "P") {
child.setAttribute("style", "background-color: blue");
}
}
}
}
CSS will automatically detect the >p rule no longer applies and remove blue backgrounds as soon as the selected class is removed(if no CSS, the blue background will not be removed)
CSS and Path hide the DOM details, if you want to improved the document.getElementsByClassName("selected") performance, you don't have to change the CSS queries, without CSS, you need to change the code.
MapReduce is a programming models for processing large amounts of data in bulk across many machines, popularized by Google.
A limited from of MapReduce is supported by some NoSQL, including MangoDB and CouchDB to perform read-only queries across many documents.
It's a hybrid pattern of declarative and imperative. -> its query logic is expressed with snippets of code, but those will be called repeatedly by the processing framework.
It;s based on map and reduce functions.
An example : you are a marine biologist and you want to add an observation records to you db , one day you want to generated a report saying how many sharks you sighted per month:
PostgreSQL(who has no MapReduce supported)
postgresql
SELECT date_trunc('month', observation_timestamp) AS observation_month,
sum(num_animals) AS total_animals
FROM observations
WHERE family = 'Sharks'
GROUP BY observation_month;
The date_trunc('month', timestamp) function determines the calendar month containing timestamp, and returns another timestamp representing the begin‐ ning of that month. In other words, it rounds a timestamp down to the nearest month.
In MongoDB:
db.observations.mapReduce(
function map() {
var year = this.observationTimestamp.getFullYear();
var month = this.observationTimestamp.getMonth() + 1;
emit(year + "-" + month, this.numAnimals);
},
function reduce(key, values) {
return Array.sum(values);
},
{
query: { family: "Sharks" },
out: "monthlySharkReport"
}
);
The filter to consider only shark species can be specified declaratively (this is a MongoDB-specific extension to MapReduce).
The JavaScript function map is called once for every document that matches query, with this set to the document object.
The map function emits a key (a string consisting of year and month, such as "2013-12" or "2014-1") and a value (the number of animals in that observation).
The key-value pairs emitted by map are grouped by key. For all key-value pairs with the same key (i.e., the same month and year), the reduce function is called once.
The reduce function adds up the number of animals from all observations in a particular month. The final output is written to the collection monthlySharkReport.
Map and reduce are restricted to what they can do. -> they are pure function that use data, and don't do data queries.
SQL can also above MapReduce to support distributed system
Usability problem: you need to write the coordinated js functions very carefully, normally harder than normal queries.
MangoDB supported query optimizer (due to declarative language) called aggression pipeline -> like subset of SQL but -> using JSON syntax
Vertex consists of : an id, a set of outgoing edges, in coming edges, a collections of properties(k-v pairs)
Edges: an id, tail vertex(edge start), head vertex(where edge ends), a label describe the relationship between two vertices; a collections of properties(k-v pairs)
The graph can be stored in relational tables in two tables, one for edges and one for vertexes.
postgresql
CREATE TABLE vertices (
vertex_id integer PRIMARY KEY,
properties json
);
CREATE TABLE edges (
edge_id integer PRIMARY KEY,
tail_vertex integer REFERENCES vertices (vertex_id),
head_vertex integer REFERENCES vertices (vertex_id),
label text,
properties json
);
CREATE INDEX edges_tails ON edges (tail_vertex);
CREATE INDEX edges_heads ON edges (head_vertex);
Good evolvability: as you add features to the application, a graph can be easily extended.
Created for Neo4J. It's a declarative language for property graphs.
cypher
CREATE
(NAmerica:Location {name:'North America', type:'continent'}),
(USA:Location {name:'United States', type:'country' }),
(Idaho:Location {name:'Idaho', type:'state' }),
(Lucy:Person {name:'Lucy' }),
(Idaho) -[:WITHIN]-> (USA) -[:WITHIN]-> (NAmerica),
(Lucy) -[:BORN_IN]-> (Idaho)
e.g. find the names of all the people who immigrated from the United States to Europe. -> find who have a BORN_IN edge on a location within the US and also LIVING_IN edge to location within Europe, and return name property of each those vertices.
cypher
MATCH
(person) -[:BORN_IN]-> () -[:WITHIN*0..]-> (us:Location {name:'United States'}),
(person) -[:LIVES_IN]-> () -[:WITHIN*0..]-> (eu:Location {name:'Europe'})
RETURN person.name
You can query things using SQL but it's more difficult. -> normal relational data model has fixed join -> here you need to traverse the edges so you don't know how many joins you need in advance.
*
operator in a regular expression sql
WITH RECURSIVE
-- in_usa is the set of vertex IDs of all locations within the United States
in_usa(vertex_id) AS (
SELECT vertex_id FROM vertices WHERE properties->>'name' = 'United States'
UNION
SELECT edges.tail_vertex FROM edges
JOIN in_usa ON edges.head_vertex = in_usa.vertex_id
WHERE edges.label = 'within'
),
-- in_europe is the set of vertex IDs of all locations within Europe
in_europe(vertex_id) AS (
SELECT vertex_id FROM vertices WHERE properties->>'name' = 'Europe'
UNION
SELECT edges.tail_vertex FROM edges
JOIN in_europe ON edges.head_vertex = in_europe.vertex_id
WHERE edges.label = 'within'
),
-- born_in_usa is the set of vertex IDs of all people born in the US
born_in_usa(vertex_id) AS (
SELECT edges.tail_vertex FROM edges
JOIN in_usa ON edges.head_vertex = in_usa.vertex_id
WHERE edges.label = 'born_in'
),
-- lives_in_europe is the set of vertex IDs of all people living in Europe
lives_in_europe(vertex_id) AS (
SELECT edges.tail_vertex FROM edges
JOIN in_europe ON edges.head_vertex = in_europe.vertex_id
WHERE edges.label = 'lives_in'
)
SELECT vertices.properties->>'name'
FROM vertices
-- join to find those people who were both born in the US *and* live in Europe
JOIN born_in_usa ON vertices.vertex_id = born_in_usa.vertex_id
JOIN lives_in_europe ON vertices.vertex_id = lives_in_europe.vertex_id;
Obviously, cypher and is more suitable for this data model.
Almost the same as the property graph model, only using different words but same idea.
turtle
@prefix : <urn:example:>.
_:lucy a :Person.
_:lucy :name "Lucy".
_:lucy :bornIn _:idaho.
_:idaho a :Location.
_:idaho :name "Idaho".
_:idaho :type "state".
_:idaho :within _:usa.
_:usa a :Location.
_:usa :name "United States".
_:usa :type "country".
_:usa :within _:namerica.
_:namerica a :Location.
_:namerica :name "North America".
_:namerica :type "continent".
NA
Resource description framework
Can be written as XML
a query language for triple stores. ->
Cypher patterns are borrowed from this sparkle, so they look alike.
SPARQL
PREFIX : <urn:example:>
SELECT ?personName WHERE {
?person :name ?personName.
?person :bornIn / :within* / :name "United States".
?person :livesIn / :within* / :name "Europe".
}
Difference of CODASYL and graph mode;
Older model.