Categories
Artificial Intelligence Uncategorized

GPT-4o is amazing at describing images

I decided to try out ChatGPT’s newly-released GPT-4o model (the lowercase “o” stands for “omni”) by uploading images and asking it to describe them to me.

After this quick test, my impression is simply “wow.”

Image 1: Me and Steven Bosco playing together at the Anaconda party at PyCon US 2022

Tap to view at full size.

Me: What’s happening in this picture?

GPT-4o: The picture shows two musicians performing on stage. One musician is playing an accordion, and the other is playing an acoustic guitar and singing into a microphone. The guitarist is wearing a cowboy hat. In the background, there is a banner with the name “Steven Bosco” and a QR code for tips via Venmo and CashApp. The setting appears to be an indoor venue with a rustic, cozy ambiance. There is a small audience in the foreground, and the atmosphere looks lively and engaging.

Not bad. I was impressed that ChatGPT was able to determine that the banner read “Steven Bosco” even though my head clearly blocks the “B” — it would have had to determine the “Bosco” from the tip sign. I was also impressed by its description of the “feel” of the room.

Image 2: A comic I drew as a storyboard for an app about authentication and authorization

Tap to view at full size.

Me: What is being depicted in this image?

GPT-4o: The image is a two-panel cartoon.

Panel 1:

  • The setting is a ship’s engine room filled with various controls and equipment.
  • There is a large machine, a computer, and a control panel with levers.
  • The text reads: “A ship’s engine room is full of important controls! If you don’t manage access to these controls, things can go very wrong, very quickly!”

Panel 2:

  • The scene shows a cat wearing a chef’s hat and holding a spoon, standing outside a door labeled “ENGINE ROOM.”
  • Next to the door is a sign that reads: “CHEF ACCESS DENIED.”
  • The cat is saying, “But I wanted to play with the big red button!”
  • The text reads: “For safety’s sake, the ship’s fine-grained authorization allows only engineers and the command crew inside the engine room.”

The cartoon humorously illustrates the importance of restricting access to critical areas on a ship, such as the engine room, to prevent unauthorized personnel from tampering with important controls.

In case you were wondering, I wrote about the comic in this article.

I wanted to see how well GPT-4o does with handwriting. I’m told that my penmanship is unusually readable; results with other people’s handwriting are bound to vary.

I was impressed by how it identified the image not just as a comic, but one with two panels. Even more impressive: that ChatGPT got the general gist of the comic, right down to its use of humor to illustrate its point.

Image 3: My notes from my computer networking class at Crazy Go Nuts University

Me: What’s this image about?

GPT-4o: The image is a detailed handwritten study guide about the OSI (Open Systems Interconnection) model, often referred to as the OSI seven-layer model or OSI seven-layer cake. The OSI model is used to standardize networking functions to support interoperability among different products and software.

The guide includes the following:

Left Side:

  • An explanation of OSI layers and their function.
    • “Provide service to the layer higher.”
    • “Relies on next lower layer to perform.”
  • A diagram showing the communication between two systems, with peer protocols between corresponding layers.
  • A reminder: “Peer layers must provide the same functions, but they need not provide them in the same way.”

Right Side:

  • A detailed description of each of the seven OSI layers:
    1. Physical Layer: Deals with the physical connection between devices, transmission of raw bit streams over a physical medium.
    2. Data Link Layer: Ensures error-free transmission between two directly connected nodes, managing frame synchronization, error control, and flow control.
    3. Network Layer: Manages data routing, forwarding, addressing, and packet switching.
    4. Transport Layer: Provides reliable data transfer services to the upper layers, including flow control, error correction, and segmentation.
    5. Session Layer: Manages sessions between applications, providing services like establishment, maintenance, and termination of connections.
    6. Presentation Layer: Translates data between the application layer and the network, handling data encryption, decryption, and compression.
    7. Application Layer: Provides network services directly to end-user applications, including email, file transfer, and network management.
  • A final note: “Remember: Standards must define the functions & services to be provided by a layer but say nothing about how that’s to be done.”

This guide is likely intended for students or professionals studying networking concepts, providing a clear and concise overview of the OSI model and its layers.

The challenge with this one was the low resolution of the photo. I took it with my late ’90s-era camera with its potato-fidelity 1024 * 768 resolution (and that was its “high-quality” mode — normal mode was 640 * 480!).

Image 4: Another comic I drew as a storyboard for that app about authentication and authorization

Tap to view at full size.

Me: Describe this image, please.

GPT-4o: This image is a four-panel cartoon strip that humorously depicts a situation involving an alien, a disguise kit, and a spaceship. Here’s a description of each panel:

Panel 1:

  • A character with antennae and a space uniform is holding a box labeled “PLATYPUS DISGUISE KIT.”
  • The character is thinking, “Let’s see if this works!”

Panel 2:

  • The same character, now disguised with a platypus head, is returning to a spaceship.
  • The character says, “It is I, Captain Platypus, returning to the ship!”
  • Another character on the ship welcomes him, saying, “Welcome back, Captain! Come aboard.”
  • In the background, someone is seen thinking, “Aw, yeah—free ship!”

Panel 3:

  • The disguised character is now onboard the ship.
  • The text reads: “Once he boards the ship…”
  • The disguised character says, “Nice ship. I think I’ll take it.”
  • Another character on the ship is startled and says, “eep!”

Panel 4:

  • The real captain, presumably Captain Platypus, returns and is shocked by the situation.
  • He exclaims, “I was gone only five minutes… WHAT HAPPENED?”
  • Another character, who appears to be tied up, responds, “I think we need an anomaly detection system.”

The cartoon humorously highlights the importance of security and the potential consequences of not having proper identification and detection systems in place on a spaceship.

This is also a comic from this article.

One of the challenges with this image is that it’s a four-panel comic, but there are no lines separating the panels — just white space. Following only the circled numbers specifying the order in which the comic should be read, GPT-4o “got” the comic, including the fact that the character in panel 1 was impersonating Captain Platypus.

GPT-4o got one thing wrong — it thought that the “Aw, yeah — free ship!” line is from panel 2; it’s the alien stealing the ship in panel 4.

Categories
Artificial Intelligence Programming

AI coding in Python? Just use Anaconda.

I’m being asked by a growing number of people this one question: “I want to get started learning and building AI apps with Python. Should I go with the Python.org distribution, Anaconda, or something else?”

For anyone who wants to learn Python and AI development, my answer is invariably:

Just. Use. Anaconda.

Why?

  1. Their one installer gets you a LOT of stuff: Not just Python, but also the R programming language, the pip and conda package managers, Jupyter Notebook and Jupyter Lab, and a ridiculous number of installed and optimized Python packages ready to go.
  2. Anaconda is made by people into scientific computing, data science, and machine learning, for people into scientific computing, data science, and machine learning. You have to do way let setup after installing it to get started on your AI journey.
  3. Anaconda is proven. Nearly 40 million people use it, and so do 90% of the Fortune 500.
  4. There’s also Anaconda Cloud, which features an online notebook service (an online interactive coding environment that also makes it easy to share code), articles and online courses, and a big user community.
  5. Anaconda delivers the same great Python programming experience on Linux, macOS, and Windows. If you move between operating systems like I do, or teach Python courses, this is incredibly handy.

Once again — if you’re learning Python for AI development:

Just. Use. Anaconda.

ℹ️ Disclaimer: This is NOT a paid promo for Anaconda. I met some of their people at PyCon, but the company has no idea who I am or that I’m doing this, and they certainly don’t know how to put money in my bank account. I’m just a satisfied user who uses it for my own projects and for teaching Python and AI programming.

Categories
Artificial Intelligence Reading Material Video What I’m Up To

Easier ways to learn how neural networks work

If you’ve tried to go past the APIs like the ones OpenAI offers and learn how they work “under the hood” by trying to build your own neural network, you might find yourself hitting a wall when the material opens with equations like this:

How can you learn how neural networks — or more accurately, artificial neural networks — do what they do without a degree in math, computer science, or engineering?

There are a couple of ways:

  1. Follow this blog. Over the next few months, I’ll cover this topic, complete with getting you up to speed on the required math. Of course, if you’re feeling impatient…
  2. Read Tariq Rashid’s book, Make Your Own Neural Network. Written for people who aren’t math, computer science, or engineering experts, it first shows you the principles behind neural networks and then leaps from the theoretical to the practical by taking those principles and turning them into working Python code.

Along the way, both I (in this blog) and Tariq (in his book) will trick you into learning a little science, a little math, and a little Python programming. In the end, you’ll understand the diagram above!

One more thing: if you prefer your learning via video…

  1. The Global Nerdy YouTube channel will be kicking it into high gear soon. If you’d like, you can follow it now!
  2. Watch 3Blue1Brown’s video on how neural networks work:
Categories
Artificial Intelligence Conferences Tampa Bay What I’m Up To

I’m speaking on the “AI Superpowers Unlocked” panel on May 15!

Masterminds Tampa Bay is holding their AI Superpowers Unlocked panel on Wednesday, May 15th with the following panelists:

  • Ken Pomella, CEO of RevStar, known for leveraging AI to enhance business growth and scalability.
  • ​Lenar Mukhamadiev, from IdelSoft, focusing on GenAI solutions for organizations and developing an AI-powered startup.
  • Sat Ramphal, CEO of Maya AI, a serial entrepreneur with deep expertise in AI applications in regulated industries.
  • Yours Truly, Joey de Villa, Supreme Developer Advocate for Unified.to, AI enthusiast, Python instructor, and general computational person about town.

Here’s Tampa Bay Masterminds’ description of the event:

Unlock the future of Artificial Intelligence at “AI Superpowers Unlocked: An Expert Panel,” an event meticulously crafted for entrepreneurs, tech enthusiasts, and forward-thinkers ready to explore AI’s transformative potential. Join us to gain practical insights on becoming a leader in AI application and connect with industry pioneers.

AGENDA

  • 6:00 PM – 6:30 PM: Socializing Time
  • 6:30 PM – 7:15 PM: Expert Panel Discussion
  • 7:15 PM – 8:00 PM: Audience Q&A / Conclusion

Main Takeaways:

​🧠 Understand the crucial role AI plays and why mastering it is essential.

​🧠 Learn strategies to best leverage AI for 2024 and beyond.

​🧠 Discover essential AI tools beyond ChatGPT.

​🧠 Explore best practices, ethics, and more through interactive FAQs.

Why You Should Attend:

🚀 Tailored for Forward-Thinkers: Designed for those poised to disrupt markets and lead innovations, this panel will help you stay ahead in the AI curve.

🚀 Unparalleled Insights: Spend an hour with AI luminaries discussing strategies and visionary applications to outpace competitors and drive success.

🚀 Networking Opportunity: Connect with like-minded professionals and innovators, and perhaps discover your next great collaboration.

This is a paid event — attendance is $35 and supports Tampa Bay Masterminds’ mission of fostering innovation and education in technology, with all ticket sales considered donations.

Want to attend? Register at lu.ma/superpowers!

Categories
Artificial Intelligence What I’m Up To

Retrieval-augmented generation explained “Star Wars” style

By popular demand, here are the “slides” from my presentation this morning at Civo Navigate Local Tampa, Make Smarter AI Apps with RAG!

Retrieval-Augmented Generation, also known as RAG for short, is an AI technique that combines…

  • A machine learning model with
  • A mechnanism for retrieving additional information that the model doesn’t have

…to enhance or improve the responses generated by the model.

At this point, you’re probably thinking this:

This talk runs from 11:15 to 11:30 a.m., which is just before lunch, and I’m not at my cognitive best. Can you explain RAG in an easy-to-digest way, possibly using Star Wars characters?

I’m only too happy to oblige!

Consider the case where you ask an LLM a question that it doesn’t “know” the answer for. The exchange ends up something like this:

Tap to view at full size.

With retrieval-augmented generation, you improve the response by augmenting the prompt you send to the LLM with data or computation from an external source:

Tap to view at full size.

Because RAG provides additional information to the LLM, it solves two key problems:

Tap to view at full size.

Here’s a lower-level view of RAG — it starts with the cleaning and conversion of the supplementary data:

Tap to view at full size.

Once that supplemetary data has been cleaned and converted, the next step is to convert it into small chunks of equal size:

Tap to view at full size.

Those chunks are then converted into vectors. If you’re not really into math but into programming, think of vectors as arrays of numbers. Each of the numbers in the vector is a value between 0.0 and 1.0, and each vector typically has hundreds of elements. In a diagram below, I’ve greatly simplified the vectors so that they’re made up of only three elements:

Tap to view at full size.

The whole process of cleaning/converting, then chunking, then embedding is called indexing:

Tap to view at full size.

Now that you know what’s happening “under the hood,” let revisit the RAG diagram, but with more detail:

Tap to view at full size.

Here’s what’s happening:

  1. Luke asks the question: “Who built you, Threepio?” That’s the query.
  2. The query is converted into vectors.
  3. The “vectorized” query is compared against the vectors that make up the supplementary information — the vectorstore — and the system retrieves a small set of the vectors that are most similar to the query vector.
  4. The query vector and the supplmentary vectors from the vectorstore are combined into a prompt.
  5. The prompt is then sent to the LLM.
  6. The LLM responds to the prompt.

That was the “hand-wavey” part of my lightning talk. The rest of the talk was demonstrating a simple RAG system written in Python and running in a Jupyter Notebook. If you’re really curious and want to see the code, you can download the Jupyter Notebook here.

Categories
Artificial Intelligence Programming What I’m Up To

Video and materials from my “AI: How to Jump in Right Away” presentation for Austin Forum

On Tuesday, April 2nd at 6:15 p.m. Central / 7:15 p.m. Eastern / 23:15 UTC, I led an online introductory session for people who to dive into AI titled AI: How to Jump In Right Away. The recording’s now on YouTube, and you can watch my presentation in its entirety there!

My session was part of Austin Forum on Technology and Society’s third annual AI April, a month of presentations, events, and podcasts dedicated to AI capabilities, applications, future impacts, challenges, and more.

Here are links to the video and supplementary material for the presentation:

Categories
Artificial Intelligence Humor Reading Material

The best damned intro to a book on machine learning ever…

…appears in Vasily “vas3k” Zubarev’s Machine Learning for Everyone, which begins with:

Machine Learning is like sex in high school. Everyone is talking about it, a few know what to do, and only your teacher is doing it.