Latest web development tutorials

NoSQL Profile

NoSQL (NoSQL = Not Only SQL), which means "not only SQL".

In modern day computing systems on the network will have a huge amount of data.

These data are a large part of the relational database management system (RDMBSs) to deal with. 1970 EFCodd's proposed relational model paper "A relational model of data for large shared data banks", which makes data modeling and application programming easier.

By applying proven relational model is very suitable for client-server programming, far beyond the expected benefits, and today it is the structured data stored in the network and business applications dominant technology.

NoSQL database is a new revolutionary movement, early on it was suggested that the development trend of more and more to the 2009 high. NoSQL advocates who promote the use of non-relational data storage, with respect to the overwhelming use of relational databases, this concept is undoubtedly injected a new thinking.

Relational database to follow the rules ACID

English transaction is transaction, and the real world of trading is very similar, it has the following four characteristics:

1, A (Atomicity) Atomic <br> atomic easy to understand, that all operations in the transaction either all done, or not do, the transaction is a transaction the conditions for success in all the operations are successful, as long as there an operation fails, the entire transaction will fail, need to roll back.

Such as bank transfers, account transfer from A to B accounts 100 yuan, is divided into two steps: 1) Take the A accounts 100 yuan; 2) to B 100 yuan deposit accounts. These two steps are completed either together or not together completed, if completed only the first step, the second step fails, the money will somehow 100 yuan less.

2, C (Consistency) consistency <br> consistency is relatively easy to understand, that has been in the database to a consistent state, run the transaction will not change the original database consistency constraints.

Existing integrity constraints such as a + b = 10, if a transaction changes a, we have to change b, so that the rear end of the transaction is still satisfy a + b = 10, otherwise the transaction fails.

3, I (Isolation) <br> so-called independence means independence will not affect each other concurrent transactions, if a transaction data to be accessed by another transaction is being modified as long as another uncommitted transaction, it accessed data is not affected by the transaction uncommitted.
For example, there is an existing trading account is transferred from A to B accounts 100 yuan, in the case of this transaction has not been completed. If the B check their accounts, can not see the newly added 100 yuan.

4, D (Durability) Persistence Persistence refers <br> once the transaction commits, it edits will be permanently stored on the database, it will not be lost even if downtime occurs.


Distributed Systems

Distributed systems (distributed system) software components multiple computers and communications connection (local network or wide area network) composed by a computer network.

Distributed systems are built on top of network software systems. It is precisely because of the characteristics of the software, the distributed system with a high degree of cohesion and transparency.

Thus, the difference between the network and distributed system that more high-level software (especially the operating system), rather than hardware.

Distributed systems can be applied on different platforms such as: Pc, workstations, LANs and WANs and the like.


The advantages of distributed computing

Reliability (fault tolerance):
Important advantage of a distributed computing system is reliability. Crash a server does not affect the rest of the server.

Scalability:
In a distributed computing system can add more machines as needed.

Resource Sharing:
Sharing data is essential for applications such as banking, reservation systems.

flexibility:
Because the system is very flexible, it is easy to install, implement and debug new services.

Faster speed:
Distributed computing system can have more computing power of computers, making it a faster processing speed than other systems.

Open Systems:
Because it is an open system that can be local or remote access to the service.

Higher performance:
Centralized computer network clusters compared to provide higher performance (and a better price).


The disadvantage of distributed computing

Troubleshooting: :
Troubleshoot and diagnose the problem.

software:
Less software support is the main drawback of distributed computing systems.

The internet:
Issues of network infrastructure, including: transmission problems, high load, information is lost and so on.

safety:
Characteristics of development of the system allows distributed computing system is vulnerable to security risks and sharing of data problems.


What is NoSQL?

NoSQL, refers to a non-relational database. NoSQL is sometimes also referred to as an abbreviation for Not Only SQL, it is different from traditional database management system relational database collectively.

NoSQL for storing large scale data. (Such as Google or Facebook trillion bits of data per day collected for their users). These types of data storage does not require a fixed pattern, no extra operation can be extended laterally.

Why NoSQL?

Today we can be third-party platform: You can easily access and fetch data (such as Google, Facebook, etc.). The user's personal information, social networking, location, user-generated data and user logs has increased exponentially. If we want to these user data mining, SQL database that is not suitable for these applications, and the development of NoSQL database is also well able to handle these large data.

web-data-image

Examples

Socialized networks:

Each record: UserID1, UserID2
Separate records: UserID, first_name, last_name, age, gender, ...
Task: Find all friends of friends of friends of ... friends of a given user.

Wikipedia page:

Large collection of documents
Combination of structured and unstructured data
Task: Retrieve all pages regarding athletics of Summer Olympic before 1950.

RDBMS vs NoSQL

RDBMS
- The highly organized structure of data
- Structured Query Language (SQL) (SQL)
- Data and relationships are stored in a separate table.
- Data Manipulation Language, Data Definition Language
- Strict consistency
- Basic Services

NoSQL
- Represents not just SQL
- No declarative query language
- No predefined pattern
- Key - value pair, the column storage, document storage, graphics, database
- Eventual consistency, rather than ACID properties
- Unpredictable and unstructured data
- CAP Theorem
- High performance, high availability and scalability

bigdata

NoSQL Brief History

NoSQL The term first appeared in 1998, is a lightweight Carlo Strozzi development, open source, does not provide SQL relational database functionality.

In 2009, Last.fm's Johan Oskarsson initiated a discussion on open source distributed database [2], Eric Evans from Rackspace again proposed the concept of NoSQL, then the NoSQL mainly refers to non-relational, distributed, do not provide ACID database design patterns.

In 2009 in Atlanta "no: sql (east)" seminar is a milestone, with the slogan "select fun, profit from real_world where relational = false;". Therefore, NoSQL most common explanation is that "non-associated type," stressed the advantages of the Key-Value Stores and document databases, rather than mere opposition RDBMS.


CAP Theorem (CAP theorem)

In computer science, CAP Theorem (CAP theorem), also known as Brewer Theorem (Brewer's theorem), which pointed out that for a distributed computing system, can not simultaneously meet the following three points:

  • Consistency (Consistency) (all nodes have the same data at the same time)
  • Availability (Availability) (ensure that every request has a response regardless of success or failure)
  • Partition tolerance (Partition tolerance) (system loss or failure of any of the information does not affect the continued operation of the system)

CAP core theory is: a distributed system can not simultaneously satisfy the consistency, availability, fault tolerance, and partition these three requirements can only meet two good while.

Thus, according to the principles of the CAP NoSQL database into the CA meet the principle, to meet and satisfy the principles of CP AP principle three categories:

  • CA - a single point of cluster meet consistency, system availability, scalability, usually on less powerful.
  • CP - meet consistency, will tolerate the partition of system performance is generally not particularly high.
  • AP - meet the availability, partition tolerance of the system, usually conformance requirements may be lower.
cap-theoram-image

NoSQL advantages / disadvantages

advantage:

  • - High Scalability
  • - Distributed Computing
  • - low cost
  • - Architectural flexibility, semi-structured data
  • - No complicated relationship

Disadvantages:

  • - There is no standardization
  • - Limited search function (so far)
  • - Final agreement is not intuitive program

BASE

BASE: Basically Available, Soft-state, Eventually Consistent. Defined by Eric Brewer.

CAP core theory is: a distributed system can not simultaneously satisfy the consistency, availability, fault tolerance, and partition these three requirements can only meet two good while.

BASE is a NoSQL database is typically weak for availability and consistency of principle requirements:

  • Basically Availble - Basic Available
  • Soft-state - soft state / Flexible transaction. "Soft state" can be understood as "no connection", and "Hard state" is "connection-oriented" in
  • Eventual Consistency - eventual consistency eventual consistency is the ultimate goal of ACID.

ACID vs BASE

ACID BASE
Atomic (A tomicity) Basic can be used (B asically A vailable)
Consistency (C onsistency) Soft state / Flexible Services (S oft state)
Isolation (I solation) Eventual consistency (E ventual consistency)
Persistent (D urable)

NoSQL database classification

Types of Some representatives

Feature
Column stores

Hbase

Cassandra

Hypertable

As the name suggests, it is stored in columns of data. The biggest feature is easy to store structured and semi-structured data, easy to do data compression, to have a very big advantage for IO of a column or columns of a query.

Document Storage

MongoDB

CouchDB

Document storage is generally used to store similar json format, content is stored in the document type. This also has the opportunity to build an index on some field, to achieve some of the features of a relational database.

key-value storage

Tokyo Cabinet / Tyrant

Berkeley DB

MemcacheDB

Redis

You can quickly check to its value by key. In general, regardless of the storage format value to inherit. (Redis contains additional features)

Map memory

Neo4J

FlockDB

Best store graphics relations. Low use of traditional relational database performance to solve it, and the design is inconvenient to use.

Object Storage

db4o

Versant

Through object-oriented language syntax is similar to operation of the database, accessed by way of the data object.

xml database

Berkeley DB XML

BaseX

Efficient XML data storage and supports internal XML query syntax, such as XQuery, Xpath.


Who Uses

There are already many companies use NoSQL:
  • Google
  • Facebook
  • Mozilla
  • Adobe
  • Foursquare
  • LinkedIn
  • Digg
  • McGraw-Hill Education
  • Vermont Public Radio