Actually, it’s both possible
This Article was originally published before on YOZM-IT as Korean
Various way of data science
There are many programming languages in the world and software that utilizes them. And those play an important role in “Data science”.
For example, if you’re using funnel analysis
to improve your product, you might want to
- Compare the bounce rates of funnel stages before and after an event,
- And perform a ratio test to calculate their statistical significance.
Meanwhile, data scientists have various career backgrounds and experiences. So They tend to use the methods they’re comfortable with, including Python
, R
, SAS
and more.
We see this quite a bit, because in most cases, the software you use at the level of business doesn’t make much of a difference.
But what happens if you “produce different results by the software used?”
The following image shows the results of running a proportion test in R
, Python
, and STATA
with example mentioned.
You can see that even though we used the same values of 1000
and 123
, the p-value
, which indicates the significance of the proportion test, is slightly different for each method.
There are many reasons why the calculation value is different depending on the method used, such as
- Different algorithms in the core logic of the programming language
- Different default values of the parameters used in the function.
In the example above, if you change the value of the parameter correct in R
and apply “Continuity correction
” as using “correct = F” , you can see that the result is the same as in STATA
.