HTTP e REST

HTTP is the acronym for HyperText Transfer Protocol. Caratteristiche principali (3) Comunicazioni fra client e server, e quanto sono comunicate le cose si chiude la connessione e ci sono politiche di caching molto bone (tipo con i proxy) Generico: perch茅 猫 un protocollo utilizzato per caricare moltissime tipologie di risorse! Stateless, ossia non vengono mantenute informazioni su scambi vecchi, in un certo modo ne abbiamo parlato in Sicurezza delle reti quando abbiamo parlato di firewall stateless....

6 min 路 Xuanqiang 'Angelo' Huang

Uniform Resource Identifier

URI Sono stata LA vera invenzione di Berners Lee accennati in Storia del web. Il problema 猫 avere un modo per identificare una risorsa in modo univoco sull鈥檌nternet. Introduzione La risorsa 馃煩 Una risorsa 猫 qualunque struttura che sia oggetto di scambio tra applicazioni all鈥檌nterno del World Wide Web. Ora una risorsa pu貌 essere qualunque cosa, non solamente solo un file! Quindi 猫 agnostico rispetto a contenuto oppure metodo di memorizzazione del dato, appare anche in questo ambiente importante vedere quanto siano importanti standard che permettano una comunicazione...

6 min 路 Xuanqiang 'Angelo' Huang

Introduction to Big Data

Data Science is similar to physics: it attemps to create theories of realities based on some formalism that another science brings. For physics it was mathematics, for data science it is computer science. Data has grown expeditiously in these last years and has reached a distance that in metres is the distance to Jupiter. The galaxy is in the order of magnitude of 400 Yottametres, which has $3 \cdot 8$ zeros following after it....

6 min 路 Xuanqiang 'Angelo' Huang

Massive Parallel Processing

We have a group of mappers that work on dividing the keys for some reducers that actually work on that same group of data. The bottleneck is the assigning part: when mappers finish and need to handle the data to the reducers. Introduction Common input formats 馃煥 You need to know well what Shards Textual input binary, parquet and similars CSV and similars Sharding It is a common practice to divide a big dataset into chunks (or shards), smaller parts which recomposed give the original dataset....

8 min 路 Xuanqiang 'Angelo' Huang

Cloud Storage

Paradigms of data storage ETL framework 馃煩 This is the classical database approach: We load the data in the database and let the underlying system handle it. This method needs some added cost in extracting, transforming and loading the data that we have stored previously in an optimized format so that it can be used for views, or else. Data Lakes 馃煩 We usually refer to Data Lakes when we store our data with Distributed file systems or using Cloud Storage: cheap ways to dump the data without caring about the possibility of modifying them....

17 min 路 Xuanqiang 'Angelo' Huang