Bytes

home

bytes

articles

why do we always take p value as

Data Science

Why do we always take p-value as 5%?

icon

Narender Ravulakollu

Technical Content Writer at almaBetter

people7 mins

people4750

Published on12 Apr, 2023

One of the most frequently asked questions in the interview is why should you always take p-value as 0.05%; and honestly, it often baffles learners as well. 

169_image_for_blogs_2_3048d15e40 (1).png

You have nothing to worry about, though! After reading this article, you will be able to gauge the importance of p-value in a layperson’s terms, instead of being sucked into the matrix of mathematics.  

What is Statistical Significance?

Statistical Significance is a measure of whether your research findings are meaningful or not.

We have always considered p-value as 5%; however, it depends on what is at stake with respect to the problem statement.

As we already know, p tells you the probability of something happening randomly. If p is 5%, it means that a particular result in your study has a 5% chance of being just a coincidence.

However, at some point you might wonder if a 5% chance is too less or too much. ? Well, it totally depends on what’s at stake.

dice_393025615_1000 - Edited.png

For example:

  • What if we told you there is a 5% chance of you winning the  lottery? You would likely buy a ticket.
  • What if we told you that there is a 5% chance of you getting killed in a car-related accident today? You would  likely call in sick.
  • However, if we told you that there is a 5% chance of your vitamin pill not doing anything for you, you would probably keep taking it anyway.

We can conclude that in research the stakes are not the same every time. Furthermore, the effect would vary depending on different problem statements.

Now, let’s delve into an intuition-based understanding.

Note: All examples are constructed to help our readers understand things from a layperson’s perspective by connecting relatable life experiences, we have no intention of hurting anyone’s sentiments.

Suppose you are convinced that your best friend's wife is cheating on him. You are  throwing a weekend party for him and you are debating whether or not to tell him. On one hand, you don't want to disturb their relationship. On the other hand, you think he has the right to know. What do you do?

twin-brothers-talking-through-tin-phone-min-min.jpg

You could be mistaken. However, before we get into that, there would be two possibilities:

  •  "She has no affair”. This is known as the "null hypothesis", since it is the proposition that we should accept as true in the absence of sufficient evidence with respect to specific conditions.
  • "She's having an affair". This is the "alternative hypothesis". You will only state the truth of this statement if you have enough data or evidence to verify it.

Before we proceed, we must first grasp the two categories of errors that can occur.

  • Type 1 error occurs when you say that the null hypothesis is false and the alternative is true (stating that she is having an affair, i.e.,. incorrectly rejecting the null hypothesis.
  • It is a Type 2 error to assume that the null is true (accepting that she is not having an affair), when in fact, she is, i.e., accepting the null hypothesis incorrectly.

Your query relates to Type 1 errors, or more specifically, the probability of making a Type 1 error.

What level of certainty is required to determine the solution to this problem statement? How confident do you need to be that your allegations of cheating against his partner are true?

There is really no way to gauge your level of certainty in situations like these that might occur in real life. 

However, in a statistical hypothesis test, you can be certain that the probability of a Type 1 error is no greater than the "significance level" that you specify. The probability of a Type 1 error can be thought of as the significance level, which is frequently represented by the Greek letter alpha.

So, how sure do you need to be?

What will happen if you are incorrect? In that situation, your friend will unfriend you – and we are not just talking about Facebook! He will be mad at you. He might even become violent.

Hence, you want to have a very low chance of being wrong while making your statement. You want to use an extremely low alpha value. Would you be okay with a 5% chance that you are making something up? Probably not. You would rather have near certainty, with only a 0.01% risk of being wrong.

In such a case we can say, the probability of being wrong should be very low, and if you take 5% as a p-value here, which means there is a 5% chance of being wrong. (If you tell your friend his wife is having an affair, then the probability of being wrong should be very less or you should have enough evidence to prove the same.)

Why do you frequently see alpha = 0.05 (5.0%)?

This is because you are reading academic research reports. There is a 5% chance that a researcher who claims to have rejected the null hypothesis at an alpha level of 0.05 is mistaken.

However, there is another possibility. Everyone else in the same research community who is an academic is aware of that. The same experiment is therefore repeated by other researchers to make sure the initial conclusion was accurate.

What has the initial researcher lost if it turns out that rejecting the null was incorrect? Just a little bit, and possibly not even that.

**Let's get into another intuition-based understanding:**

image1.jpg

For instance, a pharmaceutical company is about to launch a new drug into the market and wants to ensure that the drug is both safe and effective.

Their null hypothesis is that the drug is either unsafe or ineffective. They will reject that null and only release the drug if they are confident that it is both safe and effective.
They want to use a very low significance level because they could be sued for millions of dollars if they are wrong.

You are clearly upset that we haven't mentioned the p-value yet. In actuality, we already have.

If the null hypothesis is true, the p-value is the probability of seeing an experimental result as extreme as the one observed and the company could get into legal trouble.
By "extreme," we mean a significant departure from the conditions under which the null would be true. In a one-tailed test (e.g., the null hypothesis is that the population mean is greater than or equal to a specified value), the null hypothesis is rejected if the sample mean is significantly less than that value.

In a two-tailed test (e.g., the null hypothesis is that the population mean is exactly equal to a specified value), the null hypothesis is rejected if the sample mean is significantly greater or lower than the hypothetical value.

The decision rule, "reject the null hypothesis if the observed value of the test statistic is more extreme than the critical value", is the same as the decision rule, "reject the null hypothesis if the p-value is less than alpha", in a statistical hypothesis test

This is because the critical value of the test statistic is the one for which the p-value equals the alpha level you specify.

Finally, the answer to your question: p-value is the probability of getting the actual observed results if the null hypothesis is true.

What if the p-value doesn’t matter to you?

image - Edited-min.png

Well, if you want a quick and tainted survey, and it doesn’t matter much if you get the wrong decision, then you can have a p value of 0.8 (80% confidence that you have got the answer right).

However, if lives depend on it, as in a drug trial, and you don’t want to kill more people than the standard treatment, then you choose 0.99 for p (99% confidence).

Here’s some bad science: Choose a p value for your sponsored research, say 0.95, then find you don’t quite have a ‘result’, as in, the p value of what you’ve found is 0.93, and then decide to change the p value to say, 0.90, so that you can report a positive result, and thereby ensure that you get further funding.

Conclusion:

A p-value of 0.05 means you will be wrong 5 times out of 100 when you reject the null hypothesis.

0.05 is probably the upper limit for significance and probably 0.01 is always significant. The experimenter determines the significance.

If you want to learn more about about Hypothesis Testing and build a career in Data Science, you can sign up for AlmaBetter's Full Stack Data Science prorgam- https://www.almabetter.com/courses/full-stack-data-science.

Recommended Courses
Certification in Full Stack Data Science and AI
Course
20,000 people are doing this course
Become a job-ready Data Science professional in 30 weeks. Join the largest tech community in India. Pay only after you get a job above 5 LPA.
Masters in CS: Data Science and Artificial Intelligence
Course
20,000 people are doing this course
Join India's only Pay after placement Master's degree in Data Science. Get an assured job of 5 LPA and above. Accredited by ECTS and globally recognised in EU, US, Canada and 60+ countries.

AlmaBetter’s curriculum is the best curriculum available online. AlmaBetter’s program is engaging, comprehensive, and student-centered. If you are honestly interested in Data Science, you cannot ask for a better platform than AlmaBetter.

avatar
Kamya Malhotra
Statistical Analyst
Fast forward your career in tech with AlmaBetter

Vikash SrivastavaCo-founder & CPTO AlmaBetter

Vikas CTO
AlmaBetter
Made with heartin Bengaluru, India
  • Official Address
  • 4th floor, 133/2, Janardhan Towers, Residency Road, Bengaluru, Karnataka, 560025
  • Communication Address
  • 4th floor, 315 Work Avenue, Siddhivinayak Tower, 152, 1st Cross Rd., 1st Block, Koramangala, Bengaluru, Karnataka, 560034
  • Follow Us
  • facebookinstagramlinkedintwitteryoutubetelegram

© 2023 AlmaBetter