Zacharias 🐝 Voulgaris

3 anni fa · 2 min. di lettura · ~10 ·

Blogging
>
Il blog di Zacharias 🐝
>
Data Synthetics without A.I. and Why This Adds Value to You as an Individual

Data Synthetics without A.I. and Why This Adds Value to You as an Individual

3a2a8d0d.jpg

Data Synthetics is a term I coined to refer to the framework/processes related to synthesizing data (instead of just analyzing it). It's by far the most significant thing in data science today and one of the many applications of A.I.; namely, specialized systems generating data based on a given dataset, all while maintaining the properties of the original dataset. But isn't there an abundance of data out there? Well, yes, but we could always use some more. This rationale is much like the work of a fiction writer. The latter often fancies creating her own characters for a novel or a short story even though there are plenty of real-world characters out there she could copy and include in her text. So, if you don't want to be part of someone else's work of fiction (especially if that gets published and read by many other people), you may want to abstain from having your personally identifiable information (PII) roaming free in the world. Part of that information you may be unable to change (e.g., health-related PII, aka PHI) so, protecting it is of paramount importance.

Data synthetics can do this for you by creating new data very similar to existing data, thereby creating an unbridgeable gap between your PII and the data that is used by a predictive model, for example. This similarity can also help make these predictions relevant to you since the general underlying pattern (aka, the signal in the data) remains the same.

Plenty of brilliant A.I. professionals, be it scientists or engineers, have delved into this problem and have come up with mathematically elegant solutions. One such solution is Variational AutoEncoder (VAE, link to a comprehensive and somewhat comprehensible article on this topic), a kind of artificial neural network (ANN) that aims to figure out the underlying distributions of the data and create new data based on them. These distributions are a mathematical model aiming to describe the signal. Not the only one and probably not even the best one either, but it's good enough for something basic. The problem with VAEs (and other A.I. systems) is that they need sufficiently large datasets to figure out this signal and manifest it in new data. Additionally, building a VAE isn't so simple unless you understand the technology and the not-so-trivial math involved.

What if there was a way to develop synthetic data without utilizing A.I.? What if all you needed to know was the Math you learned in school and a few other things based on that Math, elegant but not overly sophisticated? Well, that's what I've done recently with sufficient success to consider this something usable and useful. This framework (which I call ROOF, hence the picture on the top) I developed in Julia 1.5, is low on computational resources and can be applied to any kind of continuous data (there is also a version for ordinal data though I imagine that's not something you care about that much). If you are in this sort of work or know someone who is, feel free to reach out to me. Cheers!


Commenti

Articoli di Zacharias 🐝 Voulgaris

Visualizza il blog
2 anni fa · 2 min. di lettura

This article is not a promotional one, even if it may seem like one. It's not an academic one either ...

1 anno fa · 3 min. di lettura

Overview · Lately, many professionals in the data world offer mentor and consult services. Oftentime ...

7 mesi fa · 1 min. di lettura

My team and I are working on an educational venture for data matters. Nothing too technical but some ...

Potresti essere interessato a questi lavori


  • Pwc South Africa Milano, Italia

    Job Description & SummaryPwC TLS Avvocati e Commercialisti è lo studio professionale member firm del network PwC per la consulenza legale e tributaria. Una delle principali realtà professionali del Paese, porta per connettersi al network internazionale e leader a livello globale. ...


  • Bottega Veneta Logistica S.r.l. Trissino, Italia A tempo pieno

    Description · Bottega Veneta – inspiring individuality with innovative craftmanship since 1966. Creativity lies at the heart of all that we do. Born in Vicenza the house is rooted in Italian culture yet maintains a truly global outlook. An inclusive brand with exclusive products ...


  • IBM Barano d'Ischia, Italia

    Introduction · In questo ruolo avrai l'opportunità di lavorare in uno dei nostri IBM Consulting Client Innovation Center (Delivery Center), dove forniamo una profonda esperienza tecnica e di settore ad un'ampia gamma di clienti del settore pubblico e privato in tutto il mondo. I ...