AI Agent

The Battle of the Giants

What the latest OpenAI and Google AI releases mean- and why real value comes from industry-specific AI, not just bigger models.
Yael Meretyk Hanan
3
Min read
16 May 2024

The world of AI was buzzing and bustling the past week as arm wrestling between OpenAI and Google heats up with major announcements from both players.

Hello GPT-4o

OpenAI launches GPT-4o (the O stands for omni) and grants free access to many features previously reserved for Plus subscribers like data analytics and the GPTs Store (the custom chatbots).

What’s exciting about this?

  • A new voice-based personal assistant that can infer emotions and respond to audio as quickly as a human would in conversation.
  • A model that is multimodal by design, meaning it was built and trained to handle different forms of information like speech, images, and video content.
  • There is a significant improvement in the ability to create detailed and realistic images based on textual descriptions.
  • It’s free! A lot of the previously paid features are now free. That positions OpenAI as the “true AI evangelists” and hurts the monetization efforts of their competitors.

What’s the drawback?

  • Early test results show that GPT-4o continues to make mistakes and still may “hallucinate“ in tasks that consider key AI benchmarks.
  • While making significant improvements on the UX side, this time it looks like OpenAI focused on the consumer needs rather than the enterprise customers.

Howdy Gemini

Not staying behind Google expands Gemini’s offerings in two forms — Gemini Pro 1.5 which excels in complicated tasks and Gemini Flash 1.5 which prioritizes speed and affordability.

What’s exciting about this?

  • Gemini expanded the context window to 2 million tokens. That means Gemini can now ingest and understand multiple large documents — up to 1,500 pages, the longest yet.
  • The ability to analyze live video streams has improved dramatically. That enables a whole new avenue of exciting applications.
  • The Flash model is built in a way designed to improve performance while reducing computational load which is critical to lower the cost and latency.

What’s the drawback?

  • Gemini Flash isn’t really built for consumers or even enterprises. It is built for developers who want to build faster AI software.

What’s in it for the construction industry?

First of all (some self promotion) — you can now use the Pelles GPT for Engineers and the Pelles GPT for Estimators for free! Yay!

On a serious note, as foundation models continue to improve and expand their offerings while lowering their prices, we can expect a tidal wave of AI applications. Some of these improvements, such as enhanced multimodal abilities, can directly impact the construction industry.

However, true applications must cater to the industry’s specific needs, challenges, and workflows, which a generic AI model cannot achieve.

As models develop and advance, the main beneficiaries will be us, the users. We will be able to improve our workflows and focus on what’s important. However, the quality of the tasks we want the AI to perform will depend on the quality of the data we feed into these models as well as our ability to direct the models to perform specific tasks with domain expertise guidance and quality checks of the tools we generate.

You might also want to check out these articles: