Zacharias 🐝 Voulgaris

3 anni fa · 2 min. di lettura · ~10 ·

Blogging
>
Il blog di Zacharias 🐝
>
When AI Errors Are a Data Problem (or How the GIGO Rule Applies to AI)

When AI Errors Are a Data Problem (or How the GIGO Rule Applies to AI)

f4db35d2.jpg

Regardless of your role or your level of understanding, there is little doubt that AI is a fascinating field. Even when we don't use it properly or when it exhibits problematic behavior (e.g., racial biases), it's still interesting to explore the how's and why's of it. In this article, we'll look at AI's relationship with data, particularly the data we use to train it. I'll avoid any technical jargon so that everyone can get something useful out of this text. So, let's get started, shall we?

First of all, why is data so important, even for something so sophisticated as an AI system? Well, if you think about it, any intelligent decision we make is based on information. The latter usually stems from data. Even if the exact relationship between data and information is opaque, there is no doubt that the two are linked in various ways. In AI, things aren't any different. Just instead of physical neurons, the brains of an AI process data using artificial neurons, which make up an artificial neural network (ANN), in the vast majority of cases. So, it all boils down to what data we use to brew this information that will drive the decisions at hand.

However, data comes in all shapes and forms, while not all data is of the same value (or veracity to be more accurate). Just like some platforms are full of it when it comes to content, while others maintain some standards, the datasets at our disposal are varied. And it's such a dataset that will be used to train an AI system, enabling it to make its decisions, so what dataset we use significantly affects the final result. An efficient AI system (e.g., a state-of-the-art one) will do a great job at processing the data, even cleaning it to some extent. However, it's an AI, not a miracle worker, so if you give it garbage to train with, don't expect unicorns and rainbows at the other end.

All this is reminiscent of an adage in Computer Science called the GIGO rule. This acronym stands for Garbage In Garbage Out and illustrates how if you feed a computer program garbage, it's going to spit out garbage. Take, for example, a spreadsheet. If the data you put in its cells doesn't make any sense (e.g., it's random numbers), the stuff in the pivot tables and other cells, where the results live, is bound to be equally useless. An AI isn't much different, though as a bonus, you get to waste computational resources (e.g., RAM and computer power) if you make such a mistake.

When Big Data made its debut, many people wrote about it. One of the key books I read on the topic was one from IBM, which had developed Big Data software at the time and wanted to educate the world of the potential of this new resource. In that book, the authors (who were seasoned professionals at various roles) dedicated several pages on the 4 Vs of big data (you may hear about the 3 Vs or even the 6 Vs of big data in other places). Namely, Volume, Velocity, Variety, and Veracity. The latter is the one that many people forget about, but it's one that's crucial. Because if you have lots and lots of data, some of it moving fast (e.g., a trading data stream), and having various forms (e.g., some of it coming from a database, other parts coming from Twitter, etc.), naturally not all of it will carry a strong enough signal (information). Parts of this amalgamation of data is going to be useless and are better off being jettisoned. What's left would be something that a data scientist would clean, organize, and process, to build a model that will provide some useful insights and (ideally) a service you can use even without that expert being present.

Things haven't changed much since then, though the tools have evolved, with AI being in the limelight. Still, veracity is crucial, which is why we need to be mindful of the quality of the data we use. Otherwise, if an AI makes a blunder, we only need to look at the mirror to find the culprit!


If you are interested in AI and similar topics, feel free to check out my blog. Cheers!

Commenti
Thank you for the share Fay Vietmeier!

Fay Vietmeier

3 anni fa #2

The increasing VALUE of veracity.

Fay Vietmeier

3 anni fa #1

Zacharias \ud83d\udc1d Voulgaris I will share Zacharias that I am very "technology-challenged (my son who is 23 would say "amen" .. but I am coachable and gifted with curiosity - I love knowing how things work .. so many thanks for explaining things in a way that is understandable "So, it all boils down to what data we use to brew this information that will drive the decisions at hand." In the first few sentences .. my 1st thought was 's "garbage in .. garbage out" (confirmed here) "However, it's an AI, not a miracle worker, so if you give it garbage to train with, don't expect unicorns and rainbows at the other end." (and go on to mention) the "adage in Computer Science called the GIGO rule." Making me think well I have a clue ;~) "Veracity" IS of great VALUE in a world that is increasingly full of mis-information & an even greater disregard for Truth

Articoli di Zacharias 🐝 Voulgaris

Visualizza il blog
1 anno fa · 3 min. di lettura

Overview · Lately, many professionals in the data world offer mentor and consult services. Oftentime ...

1 anno fa · 5 min. di lettura

Introducción no tan técnica · Cualquiera que se haya adentrado en el mundo de la informática ha oído ...

2 anni fa · 2 min. di lettura

Problem description · Books and (professionally made) videos are great as resources when it comes to ...

Professionisti correlati

Potresti essere interessato a questi lavori

  • Oliver James Associates Ltd.

    Senior Data Architect

    Trovato in: Buscojobs IT C2 - 5 giorni fa


    Oliver James Associates Ltd. Milano, Italia

    Sono attualmente alla ricerca, per un importantissimo gruppo chimico-farmaceutico, leader di settore di un Senior Data Architect. · Requisiti: · Esperienza di almeno 5 anni in ambito Data Architecture;Richiesta competenza sia on Cloud (AWS/Azure/GCP) sia on prem;Competenze sistem ...

  • Prysmian Group

    Pignataro Shift Leader

    Trovato in: Talent IT C2 - 2 giorni fa


    Prysmian Group Pignataro Maggiore, Italia

    Job Overview and Responsibilities · L'assistente di produzione sarà responsabile per le seguenti attività: · Supervisionare le attività di un team di operatori di linea per garantire che i piani di produzione siano soddisfatti in termini di volume, costi e qualità; · Determinar ...

  • Marriott International

    Hozpitality - Facchino - Ac Hotel Firenze

    Trovato in: Buscojobs IT C2 - 5 giorni fa


    Marriott International Firenze, Italia

    Job Number ******** · Job Category Housekeeping & Laundry · Location AC Hotel Firenze, Via Luciano Bausi 5, Firenze, Florence, Italy VIEW ON MAP · Schedule Full-Time · Located Remotely? N · Relocation? N · Position Type Non-Management · Per l' AC Hotel Firenze siamo alla r ...