Document Stores

p> Document stores provide a native database management system for semi-structured data. Document stores also scale to Gigabytes or Terabytes of data, and typically millions or billions of records (a record being a JSON object or an XML document). Introduction to Document Stores…

January 2, 2025 · Reading Time: 6 minutes · By Xuanqiang Angelo Huang

Wide Column Storage

Introduction to Wide Column Storages # One of the bottlenecks of traditional relational databases is the speed of the Joints, which could be done in O ( n ) using a merge join, assuming some indexes are present which make the keys already sorted. The other solution, of just…

December 28, 2024 · Reading Time: 9 minutes · By Xuanqiang Angelo Huang

Apache Spark

This is a new framework that is faster than MapReduce (See Massive Parallel Processing ). It is written in Scala and has a more functional approach to programming. Spark extends the previous MapReduce framework to a generic distributed dataflow, properly modeled as a DAG. There…

December 27, 2024 · Reading Time: 9 minutes · By Xuanqiang Angelo Huang

Structured Query Language

Little bits of history # It was invented in 1970 in Almaden (San Jose) by IBM (Don Chamberlin, Raymond Boyce worked on this) for the first relational database, called system R. Then for copyright issues it hasn't been called SEQUEL, so they branded it as SQL. SQL is a…

December 20, 2024 · Reading Time: 8 minutes · By Xuanqiang Angelo Huang

Data Cubes

Data Cubes is a data format especially useful for heavy reads. It has been popularized in business environments where the main use for data was to make reports (many reads). This also links with the OLAP (Online Analytical Processing) vs OLTP (Online Transaction Processing)…

December 20, 2024 · Reading Time: 4 minutes · By Xuanqiang Angelo Huang

Introduction to Big Data

Data Science is similar to physics: it attemps to create theories of realities based on some formalism that another science brings. For physics it was mathematics, for data science it is computer science. Data has grown expeditiously in these last years and has reached a…

December 20, 2024 · Reading Time: 10 minutes · By Xuanqiang Angelo Huang

HTTP e REST

HTTP is the acronym for HyperText Transfer Protocol. Caratteristiche principali (3) # Comunicazioni fra client e server, e quanto sono comunicate le cose si chiude la connessione e ci sono politiche di caching molto bone (tipo con i proxy) Generico : perché è un protocollo…

December 6, 2024 · Reading Time: 6 minutes · By Xuanqiang Angelo Huang

Querying Denormalized Data

TODO: write the introduction to the note. JSONiq purports as an easy query language that could run everywhere. It attempts to solve common problems in SQL i.e. the lack of support for nested data structures and also the lack of support for JSON data types. A nice thing about…

November 26, 2024 · Reading Time: 6 minutes · By Xuanqiang Angelo Huang