Evaluating ChatGPT & AI Systems – Apply Our Social Skills

borrowed from https://arrc.au

ChatGPT and its many competitors are now flooding the market. Prognosticators, bloviators and professors are wagging their tongues at top speed full-time. This headline appeared in the NYTimes two days ago, “Elon Musk and Others Call for Pause on A.I., Citing ‘Profound Risks to Society’“. Leaving aside the ironies of Twitter owner Musk calling out others as a danger to society, this is at least some indication that the outburst of AI applications has caught some attention and worries. Rightfully so.

Let’s be clear about one most disturbing factor. Our existing media monopolies and other giant corporations are largely underwriting all of this AI development work. They have a proven track record of preferring profits over ethics. Google, Twitter, TikTok, and Facebook manage social media platforms that continuously spew out misinformation and disinformation all around the world. So there is every reason to be assured that whatever comes of AI in the social media world will follow this pattern. They will design and implement AI to increase their sales and profits. They are not do-gooders.

A central problem in evaluating AI is that almost none of us can or will come to understand the inner mechanisms of this software. The field is dense with complex mathematics, logic, and massive computer systems. And the field has its own forest of terminology compounded by their reduction to acronyms.

Turn On Your Bullshit Detector

It seems to me that we need to apply our social skills to this problem. In our day-to-day world, we evaluate sources of information and ideas all the time. Mostly quite unconsciously.  We apply our source evaluation tools in a graduated mode. Not every piece of information or new idea needs to be fully vetted.

When we are casually chatting with neighbors and friends, we have our source validation tools turned way down. After all, we are just chatting. No big decisions are being made. Its just a chat. Think for a moment what sorts of startling or novel or outrageous comment it takes to get us to say, “Now, hold on a minute. Where did that information or idea come from?” And don’t forget that there are some neighbors not worth speaking to at all. You know from previous experience that they are unreliable, a fool, or totally ignorant.

Then, we can turn to a situation in which we need to be more rigorous. Let’s take a problem at work or school. Generally, we have a sense of who is a reliable source of information or ideas to solve a problem. Some of this sense is based on:

  • direct past history with a person
  • the person’s reputation comes into play, what do others say about them
  • we might also know that they have had many years of experience dealing with the problems at hand
  • they may also have knowledge from known educational resources, degrees from some school
  • we know that if we challenge them to explain or defend their problem solutions, they will come up with concrete evidence – I’ve done it this way many times before. You only have to look out for this issue
  • they will cite 3rd party authoritative (see below for what that constitutes) evidence
  • they will suggest a problem-solving approach to help uncover a solution
  • finally, they know that their reputation is on the line, so they are less likely to just blather or bullshit.

Authoritative?

What do we mean by authoritative? A dictionary definition looks like this: able to be trusted as being accurate or true; reliable: clear, authoritative information and advice | an authoritative source. (of a text) considered to be the best of its kind and unlikely to be improved upon: the authoritative study of mollusks. Perhaps we can sharpen this a bit. We can consider a three-tier hierarchy of authority: A first level of authoritative applies to topics where many people have examined the information available and come to similar conclusions. This is the kind of information and ideas that are by far the most common. You will find these in the NY Times, Wall St. Journal, NPR, LA Times, and many other sources where there is attention consistently given to looking at the facts, getting those straight, then providing some analysis of why, who, and how. This is the level of authority where you need to spend time considering the biases of the source’s thinking. For example, you can reliably predict that the Wall St. Journal will approach topics from a strictly neoliberal point of view. They follow the thinking of their main audience in the business and finance world.

A second level of authoritative covers the world of the non-sciences in the academy. Economics, sociology, philosophy, history, and so on. Any academic field that does not apply the rigors of the scientific method. Information and ideas from this sphere are subject to peer review. By and large, blather and bullshit don’t get published here. We are still in a world filled with controversy and conjecture, but at least it is generally well-formed controversy and conjecture. For example, tons of data demonstrate that the US healthcare system is outrageously expensive while providing near third-world results. The controversy is over how one might resolve this. This brings you to many conflicting ideas about how society should be organized. Not such an easy problem.

The third level of authoritative belongs to the world of the sciences. Here we have a pretty rigorous process of sharing data, analysis, and conclusions with other scientists. For example:

Yes, the vast majority of actively publishing climate scientists – 97 percent – agree that humans are causing global warming and climate change. Most of the leading science organizations around the world have issued public statements expressing this, including international and U.S. science academies, the United Nations Intergovernmental Panel on Climate Change, and a whole host of reputable scientific bodies around the world.1

Keep in mind that the process of science does not lead to TRUTH. It leads to the last best explanation for a set of facts. Any truth is waiting for some new facts that are not explained by this current truth or an alternative simpler truth. Some truths are pretty reliable. An apple will fall to the ground just as Newton’s proverbial one. But scientific truth is by nature and design of the process waiting to be invalidated. In fact, that is a signal nature of scientific truth. It must be refutable.

When we look at sources of information and ideas like books, newspapers, TV, blogs, podcasts, videos, documentaries, academic journals, and more, we should always examine where the money is coming from to fund the sources. An examination of the About page on most websites will lead you to a list of key managers, companies, foundations, and others who are leading and funding the source.

Back to ChatGPT

So, lets turn our bullshit detector on ChatGPT. It is notoriously prone to making factual errors. However, it appears to have improved in providing reference sources for its work. I recently queried ChatGPT about Modern Monetary Theory and inflation.  It responded with a reasonable explanation of the topic though more time was taken with critique than explication. It also provided a list of references at my request that included many leading lights in the MMT field.

In querying ChatGPT about MMT and inflation, I had the following exchange:

My question: Why wouldn’t the increased demand caused by increases in the money supply simply bring about greater production with the unused production capacity?

ChatGPT response: While an increase in demand can lead to greater production and utilization of unused production capacity in the economy, there are limits to how much additional production can be achieved. Even with unused capacity, there are often constraints on the availability of inputs such as labor, raw materials, and capital equipment, which can limit the ability of firms to increase production.

Additionally, increasing production can be costly and time-consuming, particularly in industries that require specialized equipment or skilled workers. This can lead to a lag between increased demand and increased production, during which time prices can rise as firms seek to capitalize on the increased demand.

Moreover, it is important to consider the types of goods and services that are in demand. If the increased demand is for goods and services that are in limited supply or that cannot be easily produced, such as housing or healthcare, then an increase in the money supply can lead to inflation even if there is unused production capacity in other sectors of the economy.

Overall, while an increase in the money supply can lead to increased production and utilization of unused production capacity in the economy, there are limits to how much additional production can be achieved in the short term. Additionally, the specific nature of the goods and services in demand can also play a role in determining whether an increase in the money supply leads to inflation or not.

This is a pretty sophisticated response. I am impressed. However, on checking the references provided for this interchange, one led to this graph from the St. Louis Federal Reserve Bank that displays the well-known fact that the US economy has been operating significantly below capacity for decades: The other three references had broken links or simply made no sense as references. Despite repeated calls to ChatGPT to correct these errors, it simply repeated the errors.

My bullshit detector was aroused. My suspicion is that this is an example of what Noam Chomsky pointed out in his 3.8.2023 NYTimes joint opinion piece about AI :

“Roughly speaking, they take huge amounts of data, search for patterns in it and become increasingly proficient at generating statistically probable outputs — such as seemingly humanlike language and thought.”2

I then asked ChatGPT the following, and it replied:

My Question: My PSA level has varied from 5.1 to 4.3 over the past four years. Should I be concerned?

ChatGPT: It is always a good idea to keep an eye on changes in your PSA (Prostate-Specific Antigen) levels, as fluctuations can sometimes indicate underlying health issues. However, a small variation in your PSA levels, such as the ones you mentioned (5.1 to 4.3), may not necessarily be cause for concern.

PSA levels can naturally fluctuate due to a variety of factors, such as age, prostate inflammation, or recent sexual activity. Additionally, PSA levels can also vary depending on the laboratory that analyzes the blood sample, which is why it’s important to use the same lab for all your PSA tests.

It’s important to note that a PSA level of 4.0 ng/mL or higher is generally considered elevated and may warrant further investigation, such as a prostate biopsy. However, even if your PSA levels are elevated, it doesn’t necessarily mean that you have prostate cancer. Other conditions, such as an enlarged prostate or prostatitis, can also cause elevated PSA levels.

It’s best to discuss your PSA levels with your doctor, who can take into account your medical history, symptoms, and other factors to determine if further testing or monitoring is necessary. They may also recommend lifestyle changes or other measures to help lower your PSA levels if needed.

I then asked ChatGPT to expand on this answer.

My Question: why did you say this: “However, a small variation in your PSA levels, such as the ones you mentioned (5.1 to 4.3), may not necessarily be cause for concern.”?
ChatGPT: I mentioned that a small variation in PSA levels, such as the ones you mentioned (5.1 to 4.3), may not necessarily be cause for concern because it’s not uncommon for PSA levels to fluctuate slightly over time. PSA levels can be influenced by a variety of factors such as age, prostate size, medications, urinary tract infections, and recent ejaculation. Also, different labs can use slightly different methods for analyzing PSA levels, which can result in slightly different measurements.

While a rise in PSA levels over time may indicate a potential issue, a small variation such as the one you mentioned may not necessarily be significant. However, it’s always important to discuss any changes in PSA levels with your doctor, who can determine if further testing or monitoring is needed.

This response is in line with what I have heard from various doctors about this topic. That’s reassuring. However, when I asked for references, it came up with two generic statements from the American Cancer Society and the Mayo Clinic followed by three that were about completely unrelated topics in pancreas and lung cancers. The BS detector flashes again.

So, for the moment, ChatGPT does not provide reliable answers, nor can it provide an explanation of how it came to write what it did, nor provide a trail of references to substantiate its claims. It remains a gigantic probabilistic guessing machine with a clever human language interface.

My bullshit detector is flashing red.

borrowed from https://arrc.au

I remain outraged and concerned that huge corporations have unleashed this on the general population without regard to the consequences. All of that is just an external cost of their hunt for higher sales and larger profits.

Footnotes

  1. https://climate.nasa.gov/faq/17/do-scientists-agree-on-climate-change/
  2. Noam Chomsky, Ian Roberts, and Jeffrey Watumull, “Opinion | Noam Chomsky: The False Promise of ChatGPT,” The New York Times, March 8, 2023, sec. Opinion, https://www.nytimes.com/2023/03/08/opinion/noam-chomsky-chatgpt-ai.html.