{"id":5905,"date":"2019-05-10T11:00:44","date_gmt":"2019-05-10T05:30:44","guid":{"rendered":"https:\/\/gtm360.com\/blog\/?p=5905"},"modified":"2022-03-02T16:16:12","modified_gmt":"2022-03-02T10:46:12","slug":"demystifying-the-ubiquitous-sample-size-of-2000","status":"publish","type":"post","link":"https:\/\/gtm360.com\/blog\/2019\/05\/10\/demystifying-the-ubiquitous-sample-size-of-2000\/","title":{"rendered":"Demystifying The Ubiquitous Sample Size Of 2000"},"content":{"rendered":"<p>Back in the day, we learned in statistics that you need a sample size of at least 2% of the size of population to make statistically significant inferences about the population. In common speak, the expression &#8220;statistically significant&#8221; means &#8220;valid&#8221;.<\/p>\n<p>Nevertheless, if you&#8217;re like me, you regularly come across surveys conducted on sample size of 2000 that make inferences on populations of hundreds of millions of members. Given below are examples of such surveys that use sample sizes that are way below 2% of the population:<\/p>\n<p><strong>EXAMPLE 1<\/strong><\/p>\n<blockquote class=\"twitter-tweet\" data-width=\"550\" data-dnt=\"true\">\n<p lang=\"en\" dir=\"ltr\">British consumers don&#39;t understand what open banking is any more than Americans do, even though in theory they&#39;ve had it for a year <a href=\"https:\/\/t.co\/p3GOSwe28s\">https:\/\/t.co\/p3GOSwe28s<\/a> <a href=\"https:\/\/twitter.com\/hashtag\/OpenBanking?src=hash&amp;ref_src=twsrc%5Etfw\">#OpenBanking<\/a> <a href=\"https:\/\/twitter.com\/hashtag\/datasharing?src=hash&amp;ref_src=twsrc%5Etfw\">#datasharing<\/a> <a href=\"https:\/\/twitter.com\/hashtag\/API?src=hash&amp;ref_src=twsrc%5Etfw\">#API<\/a> <a href=\"https:\/\/twitter.com\/AmerBanker?ref_src=twsrc%5Etfw\">@AmerBanker<\/a> <a href=\"https:\/\/twitter.com\/RFIGroup_?ref_src=twsrc%5Etfw\">@RFIGroup_<\/a><\/p>\n<p>&mdash; Penny Crosman (@pennycrosman) <a href=\"https:\/\/twitter.com\/pennycrosman\/status\/1075440678363107329?ref_src=twsrc%5Etfw\">December 19, 2018<\/a><\/p><\/blockquote>\n<p><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<p><strong>EXAMPLE 2<\/strong><\/p>\n<p>Not even 2K&#8230;<\/p>\n<blockquote class=\"twitter-tweet\" data-width=\"550\" data-dnt=\"true\">\n<p lang=\"en\" dir=\"ltr\">France, Ifop poll:<\/p>\n<p>President Macron Approval Rating<\/p>\n<p>Approve: 27% (+4)<br \/>Disapprove: 72% (-4) <\/p>\n<p>Field work: 11\/01\/19 \u2013 19\/01\/19<br \/>Sample size: 1,928<\/p>\n<p>&mdash; Europe Elects (@EuropeElects) <a href=\"https:\/\/twitter.com\/EuropeElects\/status\/1086794905580695558?ref_src=twsrc%5Etfw\">January 20, 2019<\/a><\/p><\/blockquote>\n<p><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<p><strong>EXAMPLE 3<\/strong><\/p>\n<p><a href=\"https:\/\/www.forbes.com\/sites\/ronshevlin\/2019\/01\/21\/an-amazon-checking-account-could-displace-250-billion-in-bank-deposits-but-it-wont\/#39c60ec2254a\" target=\"_blank\" rel=\"noopener noreferrer\">An Amazon Checking Account Could Displace $100 Billion In Bank Deposits (But It Won&#8217;t)<\/a><\/p>\n<p><strong>EXAMPLE 4<\/strong><\/p>\n<p><a href=\"http:\/\/lnr.li\/abu4i\" target=\"_blank\" rel=\"noopener noreferrer\">Most Americans foresee death of cash in their lifetimes<\/a><\/p>\n<p><strong>EXAMPLE 5<\/strong><\/p>\n<p><a href=\"https:\/\/www.finextra.com\/newsarticle\/33498\/consumers-see-bitcoin-as-a-get-rich-quick-investment\" target=\"_blank\" rel=\"noopener noreferrer\">Barely three percent of the 2000+ consumers surveyed by the FCA had made an investment in cryoptoassets such as bitcoin and ether<\/a><\/p>\n<hr style=\"width: 70%;\" \/>\n<p>For reference, the population of Great Britain is ~60 million and USA is ~300 million.<\/p>\n<p>The sample sizes in these studies work out to 0.0033 to 0.00066 percent of the respective population.<\/p>\n<p>Since they&#8217;re well short of our 2% bar, should we debunk the findings of these studies?<\/p>\n<p>At one point, I thought yes.<\/p>\n<p>But, now, I&#8217;m not so sure. Many of these studies have been published by well-reputed media outlets and can&#8217;t be dismissed so easily.<\/p>\n<p>So, I decided to probe the topic further.<\/p>\n<hr style=\"width: 70%;\" \/>\n<p>I came across this\u00a0<a href=\"http:\/\/www.raosoft.com\/samplesize.html\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>online sample size calculator<\/strong><\/a>, which says a sample size of 1006 yields a 95% confidence value of results with 3% error margin for a population of 300 million.<\/p>\n<p><a href=\"https:\/\/d2jhuj1whasmze.cloudfront.net\/photos\/normal\/m03oZ.jpg\" target=\"_blank\" rel=\"noopener noreferrer\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone\" src=\"https:\/\/d2jhuj1whasmze.cloudfront.net\/photos\/normal\/m03oZ.jpg\" alt=\"\" width=\"420\" height=\"360\" \/><\/a><\/p>\n<p>Not convinced with the above, I found a formula to calculate error margin\u00a0<a href=\"https:\/\/www.dummies.com\/how-to\/content\/how-to-calculate-the-margin-of-error-for-a-sample0.html\" target=\"_blank\" rel=\"noopener noreferrer\">here<\/a>. When I plugged in the values, I found the results tallying with the above.<\/p>\n<p><a href=\"https:\/\/d2jhuj1whasmze.cloudfront.net\/photos\/normal\/m04wt.jpg\" target=\"_blank\" rel=\"noopener noreferrer\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone\" src=\"https:\/\/d2jhuj1whasmze.cloudfront.net\/photos\/normal\/m04wt.jpg\" alt=\"\" width=\"389\" height=\"80\" \/><\/a><\/p>\n<p>I was also intrigued by the following line on the sample size calculator website:<\/p>\n<blockquote><p>\u201cThe sample size doesn&#8217;t change much for populations larger than 20,000.\u201d<\/p><\/blockquote>\n<p>This runs totally contradictory to the standard creed &#8220;Larger the population, the larger is the size of sample required for testing.&#8221;<\/p>\n<hr style=\"width: 70%;\" \/>\n<p>What gives?<\/p>\n<p>I suspect it has something to do with the composition of a population.<\/p>\n<p>Surveys make a tacit assumption that their samples are &#8220;truly representative&#8221; of the population. To understand that concept, let&#8217;s take the following maxim that I call the <em><strong>Gallup Soup Principle<\/strong><\/em>:<\/p>\n<p><a href=\"https:\/\/gtm360.com\/blog\/wp-content\/uploads\/2019\/05\/gallup-soup-principle-fi.jpg\" target=\"_blank\" rel=\"noopener noreferrer\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-6561 size-full\" src=\"https:\/\/gtm360.com\/blog\/wp-content\/uploads\/2019\/05\/gallup-soup-principle-fi.jpg\" alt=\"\" width=\"630\" height=\"280\" srcset=\"https:\/\/gtm360.com\/blog\/wp-content\/uploads\/2019\/05\/gallup-soup-principle-fi.jpg 630w, https:\/\/gtm360.com\/blog\/wp-content\/uploads\/2019\/05\/gallup-soup-principle-fi-200x89.jpg 200w\" sizes=\"auto, (max-width: 630px) 100vw, 630px\" \/><\/a><\/p>\n<p>Some populations are homogeneous i.e. of the same kind. For example, liquids that have been given a good stir. (Unlike James Bond&#8217;s Martinis, which are shaken, not stirred!).<\/p>\n<p>Other populations are heterogeneous i.e. diverse in character. For example, the terrain of the earth (comprising plains, mountains, water bodies, etc.)<\/p>\n<p>A nation tends to be homogeneous on some attributes (e.g. nationality of its residents) but heterogeneous on some others (e.g. income).<\/p>\n<p>In this post,<\/p>\n<ul>\n<li><strong><em>Homogeneous<\/em><\/strong> means &#8220;homogeneous by nature&#8221; and \/ or &#8220;becomes homogeneous after being given a good stir&#8221;.<\/li>\n<li><strong><em>Heterogeneous<\/em><\/strong> means &#8220;heterogeneous by nature&#8221; and &#8220;does not become homogeneous even after being given a good stir&#8221;.<\/li>\n<\/ul>\n<p>Unlike liquids, you can&#8217;t stir many populations, so &#8220;good stir&#8221; effectively happens by taking a random sample of those populations.<\/p>\n<p>For homogeneous populations, a sample size of 2K is akin to the spoonful of soup in the Gallup Soup Principle. Accordingly, we can draw valid inferences about the whole population, however large it is, based on a survey conducted with such a small sample size.<\/p>\n<p>For heterogeneous populations, a sample size of 2K is unlike the spoonful of soup. Accordingly, inferences drawn from a survey with such a small sample may not be valid for the population. But that does not stop people from conducting such surveys, which explains why results of so many studies are misleading and \/ or contradictory to one another.<\/p>\n<p><em>Misleading results<\/em><\/p>\n<ul>\n<li>Indians don&#8217;t speak Hindi (Survey of 2000 Indians in Tamil Nadu, a southern state of India in which Tamil is the local language)<\/li>\n<li>America has no mountains (Survey of the terrain of 2000 square miles of Kansas, a state in USA that has only plains)<\/li>\n<li><a href=\"https:\/\/gtm360.com\/blog\/2018\/05\/25\/pixie-dust-sampling-or-how-to-commit-harakiri-by-lying-with-big-data\/\" target=\"_blank\" rel=\"noopener noreferrer\">95% banks have innovation labs<\/a> (Survey of 100% of banks who have innovation labs)<\/li>\n<\/ul>\n<p><em>Contradictory results<\/em><\/p>\n<ul>\n<li>Cash is dead v. Cash in circulation is growing<\/li>\n<li>Branch is dead v. Banks are opening new branches<\/li>\n<li>Omnichannel shopping is BS v. <em>Book Online &amp; Collect At Store<\/em> is the future of retail.<\/li>\n<\/ul>\n<p>Such disingenuous surveys provide the underpinning for <em>Shevlin&#8217;s Law<\/em>.<\/p>\n<blockquote class=\"twitter-tweet\" data-width=\"550\" data-dnt=\"true\">\n<p lang=\"en\" dir=\"ltr\">Shevlin\u2019s Law: &quot;For every data point that proves one point of view, there are two that refute it.&quot; ~ <a href=\"https:\/\/t.co\/VysoVR1vEp\">https:\/\/t.co\/VysoVR1vEp<\/a> via <a href=\"https:\/\/twitter.com\/rshevlin?ref_src=twsrc%5Etfw\">@rshevlin<\/a> .<\/p>\n<p>Also see <a href=\"https:\/\/t.co\/jMryL4E6F9\">https:\/\/t.co\/jMryL4E6F9<\/a> . <a href=\"https:\/\/twitter.com\/hashtag\/Statistics?src=hash&amp;ref_src=twsrc%5Etfw\">#Statistics<\/a> <a href=\"https:\/\/twitter.com\/hashtag\/Lying?src=hash&amp;ref_src=twsrc%5Etfw\">#Lying<\/a> <a href=\"https:\/\/twitter.com\/hashtag\/BigData?src=hash&amp;ref_src=twsrc%5Etfw\">#BigData<\/a> <a href=\"https:\/\/twitter.com\/hashtag\/Quantipulation?src=hash&amp;ref_src=twsrc%5Etfw\">#Quantipulation<\/a><\/p>\n<p>&mdash; Ketharaman Swaminathan (@s_ketharaman) <a href=\"https:\/\/twitter.com\/s_ketharaman\/status\/1270996763827146753?ref_src=twsrc%5Etfw\">June 11, 2020<\/a><\/p><\/blockquote>\n<p><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<hr style=\"width: 70%;\" \/>\n<p>A survey with a sample size of merely 2000 <em>can<\/em> deliver statistically significant results for a population of millions <em>provided<\/em> the sample is representative of the population, which in turn depends on how homogeneous or heterogeneous the population is.<\/p>\n<p>But there&#8217;s no quantitative measure of a population&#8217;s composition, so it&#8217;s not possible to rigorously prove that a sample is representative of the population.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Back in the day, we learned in statistics that you need a sample size of at least 2% of the size of population to make statistically significant inferences about the population. In common speak, the expression &#8220;statistically significant&#8221; means &#8220;valid&#8221;. Nevertheless, if you&#8217;re like me, you regularly come across surveys conducted on sample size of &#8230; <a title=\"Demystifying The Ubiquitous Sample Size Of 2000\" class=\"read-more\" href=\"https:\/\/gtm360.com\/blog\/2019\/05\/10\/demystifying-the-ubiquitous-sample-size-of-2000\/\" aria-label=\"Read more about Demystifying The Ubiquitous Sample Size Of 2000\">Read more<\/a><\/p>\n","protected":false},"author":4,"featured_media":6561,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[17,18,6,4,8,13,1],"tags":[],"class_list":["post-5905","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-b1-integrated-marketing","category-b2-product-v-services","category-bfsi","category-digital-marketing","category-it-marketing","category-product","category-mandatory-category"],"_links":{"self":[{"href":"https:\/\/gtm360.com\/blog\/wp-json\/wp\/v2\/posts\/5905","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/gtm360.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/gtm360.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/gtm360.com\/blog\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/gtm360.com\/blog\/wp-json\/wp\/v2\/comments?post=5905"}],"version-history":[{"count":21,"href":"https:\/\/gtm360.com\/blog\/wp-json\/wp\/v2\/posts\/5905\/revisions"}],"predecessor-version":[{"id":9261,"href":"https:\/\/gtm360.com\/blog\/wp-json\/wp\/v2\/posts\/5905\/revisions\/9261"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/gtm360.com\/blog\/wp-json\/wp\/v2\/media\/6561"}],"wp:attachment":[{"href":"https:\/\/gtm360.com\/blog\/wp-json\/wp\/v2\/media?parent=5905"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/gtm360.com\/blog\/wp-json\/wp\/v2\/categories?post=5905"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/gtm360.com\/blog\/wp-json\/wp\/v2\/tags?post=5905"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}