Data Models and Validation

A data model is an abstract view over the data that hides the way it is stored physically. The same idea from (Codd 1970) This is why we should not modify data directly, but pass though some abstraction that maintain the properties of that specific data model. Data Models Tree view We can view all JSON and XML data, as presented in HTML and Markup, as trees. This structure is usually quite evident, as it is inherent in their design....

7 min 路 Xuanqiang 'Angelo' Huang

Distributed file systems

We want to know how to handle systems that have a large number of data. In previous lesson we have discovered how to quickly access and make Scalable systems with huge dimensions, see Cloud Storage. Object storage could store billions of files, we want to handle millions of petabyte files. Desiderata of distributed file systems 馃煩 In this case we have a Filesystem. In 2004 google created his own FS. With hundreds or thousands of machines the systems are practically guaranteed to fail....

8 min 路 Xuanqiang 'Angelo' Huang

Uniform Resource Identifier

URI Sono stata LA vera invenzione di Berners Lee accennati in Storia del web. Il problema 猫 avere un modo per identificare una risorsa in modo univoco sull鈥檌nternet. Introduzione La risorsa 馃煩 Una risorsa 猫 qualunque struttura che sia oggetto di scambio tra applicazioni all鈥檌nterno del World Wide Web. Ora una risorsa pu貌 essere qualunque cosa, non solamente solo un file! Quindi 猫 agnostico rispetto a contenuto oppure metodo di memorizzazione del dato, appare anche in questo ambiente importante vedere quanto siano importanti standard che permettano una comunicazione...

6 min 路 Xuanqiang 'Angelo' Huang

Introduction to Big Data

Data Science is similar to physics: it attemps to create theories of realities based on some formalism that another science brings. For physics it was mathematics, for data science it is computer science. Data has grown expeditiously in these last years and has reached a distance that in metres is the distance to Jupiter. The galaxy is in the order of magnitude of 400 Yottametres, which has $3 \cdot 8$ zeros following after it....

6 min 路 Xuanqiang 'Angelo' Huang

Document Stores

Document stores provide a native database management system for semi-structured data. Document stores also scale to Gigabytes or Ter- abytes of data, and typically millions or billions of records (a record being a JSON object or an XML document). Introduction to Document Stores a document store, unlike a data lake, manages the data directly and the users do not see the physical layout. Unlike data lakes, using document stores prevent us from breaking data independence and reading the data file directly: it offers an automatic manager service for semi-structured data that we need to throw and read quickly....

2 min 路 Xuanqiang 'Angelo' Huang