The study I'm referencing tested 35 AI models across 172 billion tokens of document information (in other words: the testing was robust). Here's what the study found:
Even giving the AI the document and then asking about the document it was holding (i.e. -- not asking the AI to go from memory but to go from the data right in front of its virtual face), the very best AI model in the study fabricated answers 1.19% of the time. That's under optimal conditions, not under a stressful real-world application.
Most top models fabricated at rates of 5% to 7%. With the answers in hand. The human equivalent would be looking at a page and still making up what it said.
The median fabrication rate across all 35 models tested was ~25%.
As I posted on X: I think this problem isn't talked about enough. Because 100% of the "AI will save/destroy the world" expectations hinge on the ASSUMPTION that this problem will be solved at some point.
What if it just... never gets solved? What if it's fundamental to the way we've designed AI? What if our design guarantees this problem manifests and we can't solve it without losing all the progress we've made? (The way, say, car designs guarantee that cars will roll and you can't solve "car rolls downhill when brakes are disengaged" without starting over from scratch.)
Jason Haver on X (this also contains the link to the study)
Because (as I posed in a follow up): Imagine a company where the CEO hallucinates data, sales figures, suppliers, manufacturing capacity, etc., at even a rate of 5% per day. That company is bankrupt before it even gets off the ground. ALL "AI will run everything" scenarios assume this gets solved favorably.
Or imagine a doctor who hallucinates illnesses you don't have and cures that don't exist.
In other words, the entire AI dream is built on the belief that this foundational problem will get solved. If this problem does not get solved, then we're currently in a massive hubristic AI bubble.
I'm not saying it won't eventually get fixed -- I don't know whether it will or not. I'm just saying: we have to consider the possibility that it doesn't. In which case, AI may be a decent research partner (assuming you verify everything it tells you), and it may aid in general human progress the way, say, supercomputers did -- but it will never be able to run anything important on its own.
Food for thought.
Market-wise, last update concluded:
This is probably the last chance for bulls to pull a whipsaw.
And they did. But, in INDU at least, they have not yet broken back above key resistance. Let's look at SPX first:
INDU is below resistance and has so far only back-tested it:
COMPQ, on the other hand, is still above support (!):
In conclusion, at this exact moment, we have a mixed bag of signals. SPX is in no-man's land; INDU is below resistance; COMPQ is above support. This is an absolutely schizoid market, but what we can reasonably infer from this is that bulls are probably out of chances and likely need to keep this bounce going. If they can't, then bears (probably) finally get the ball. Trade safe.



No comments:
Post a Comment