myBAR Data Analysis By ChatGPT

I recently asked ChatGPT to dive into my book collection, crunch the numbers, and give me insights into my reading behavior.

This blog post is the result of that exercise.


I’ve read over two thousand books. You’ll find some of them in my bookshelf. But many books I’ve read are missing from it since:

  • I borrowed them from lending libraries and returned them after I finished reading them.
  • I purchased some books, read them, and sold them back. There was this Polish book lover in Muscat who had a personal collection of over 10,000 books. He opened a bookstore which had an innovative operating model. You bought books like you’d from any other bookstore. But, unlike normal bookstores, you could return the books after you read them and get back 50% of the price you paid for them. So, it effectively functioned like a lending library. I must have bought 200-odd books from this store during my 5+ years of living in Oman in the early-to-mid 1990s. While I retained some of them, I sold back a vast majority.
  • I switched to eBooks nearly 10 years ago. (In case you’re interested, I’ve both Barnes & Noble Nook Color and Amazon Kindle ebook readers. I prefer the Nook Color.)

Around 15 years ago, I started rating all the new books I read from that point onwards on a 5 point scale as shown in the following exhibit.

I also wrote reviews for most of those books.

You can see my ratings and reviews in this Google Sheet.

I call this “myBAR Dataset” for ready reference (myBAR stands for “My Book Alert Rating”, a derivative of my perpetually-work in progress initiative called mybookalert.com).

I recently noticed that the above Google Sheet had grown to over 600 rows.

I was wondering if it contained any hitherto unknown patterns about my reading habits.

I exported the Google Sheet to Excel format and uploaded the resulting XLS file to ChatGPT, with the following prompt:

Can you find any insights in this data?

ChatGPT started crunching the numbers immediately – without asking any questions – and gave me tons of insights, some basic and others advanced.

Basic Insights

These are based on an analysis of data present in myBAR dataset.

1. Overview

“Your file has 592 rows and 7 columns”.

I checked myBAR dataset on Excel and found this to be correct.

2. Columns overview

ChatGPT interpreted all column labels correctly except the last one. Read is meant to be the date on which I read the book but ChatGPT misinterpreted it as “Whether the book has been read”. I’m guessing it got this wrong because I started entering the date only years after I opened this spreadsheet, as a result of which this column is blank in the first 300-odd rows.

3. Overall summary

I used the Data > Filter command in Excel to verify the above.

  • The table looked okay.
  • I discovered that the descriptor for rating was inconsistent across rows e.g. Some 3* ratings said “Good: Definitely worth reading” whereas others said “Good: Worth reading”.
  • Assuming that my ratings follow the normal distribution, most ratings will be 3* by definition! (More on that in a bit)

4. Top authors

Hmmm. While many of them are among my favorite authors, the list missed Joseph Heller and James A. Michener, who, along with John Irving, are my Top 3 authors. I checked myBAR dataset in Excel – it does include one book each of Heller and Michener.

5. Top rated books

Hmmm.

The first five books are bingo – after all, I’ve rated them 5*. But there’s one more 5* rated book in myBAR dataset – Gödel, Escher, Bach: an Eternal Golden Braid by Douglas R. Hofstadter. Not sure why ChatGPT has missed it.

Then there are 44 books rated 4*. Not sure on what basis ChatGPT picked the above four out of those 44. I just checked – these four entries are not even the first four 4* rated books in myBAR dataset.

Advanced Insights

After providing the above basic insights by crunching the data present in myBAR dataset, ChatGPT offered to gather additional data not present in myBAR dataset, and deliver advanced insights based on them.

I asked ChatGPT to go ahead.

It replied back saying

I’ll get back with the results soon.

This is the first time that ChatGPT has gone asynch on me. In my hundreds of convos with it during the last three years, ChatGPT has always started typing its answers as soon as I’ve typed my prompt and hit enter.

I’m guessing this is because it might take a while to collect and analyze the data required to provide the advanced insights, and ChatGPT didn’t want me to hang around wasting my time.

I was curious to know how ChatGPT would get back to me – like, would it send me an email when the results are ready, which is how many services respond when you ask them to do time-consuming tasks e.g. Download Twitter Archive.

It replied that it did not have the ability to send out emails – or any form of notifications – outside of the chat. ChatGPT told me that it’d refresh the page when it had finished crunching the data and asked me to come back to the page after a couple of hours.

When I revisited, I saw the following results on the page.

6. Genre Summary

7. Top-Rated Books by Genre

8. Overall Insights


I verified the basic insights and can confirm that ChatGPT got them right – except where I’ve pointed out above that it didn’t.

I had a quick glance at the advanced insights. While nothing looked awry, I must admit that it’s above my paygrade to verify them.

I’ve been following successive generations of data analytics platforms from plain vanilla analytics through to smart data discovery. As I noted in my blog post entitled Data Analysis By ChatGPT, ChatGPT beats them all on cost and speed. Its data analysis of myBAR dataset only reinforces that opinion. But I’ve no idea about its accuracy.

As far as I know, it requires sophisticated data analytics tools to verify ChatGPT’s findings.

This brings us to an impasse: On the one hand, we cannot verify the accuracy of ChatGPT’s results. On the other hand, we cannot accept unverified results.

Does that mean we should avoid using ChatGPT for data analysis?

No. According to me, it’s a good idea to use ChatGPT to carry out quick-and-dirty data analysis and get as many insights as possible within the shortest possible time. If any of those insights are actionable, then it’s worth spending the time and money on purpose-built data analytics tool to validate those insights before taking the suggested actions.