WEBVTT 00:00:00.060 --> 00:00:02.580 In research we're quite  often interested in theory.  00:00:02.580 --> 00:00:09.000 And in quantitative research we use numbers  to either develop or justify theory. 00:00:09.000 --> 00:00:13.140 To understand the relationship in  theory and data we need to start   00:00:13.140 --> 00:00:17.460 by thinking or defining what  we mean by the term theory.  00:00:17.460 --> 00:00:22.770 And we also need to understand a little  bit of philosophy to know why and how   00:00:23.370 --> 00:00:28.380 using numbers to justify theory it's  a justifiable way of doing research. 00:00:28.380 --> 00:00:33.150 So theory first of all it's an explanation. 00:00:33.150 --> 00:00:41.430 So theory is a statement about relationships  and how when and why those relationships occur. 00:00:41.430 --> 00:00:48.540 And it's different from description which just  explains what happens without the question. 00:00:48.540 --> 00:00:55.740 And typically when we write research papers  it's easy to say what happens and how. 00:00:55.740 --> 00:01:00.510 But during the justification the 'why'  part of the theory that's the hard part. 00:01:00.510 --> 00:01:07.650 So theory is in Bacharach  presentation, it's presented as levels. 00:01:07.650 --> 00:01:16.380 So we have first, X the theoretical concept here. And when we go from high level abstraction to   00:01:16.380 --> 00:01:20.670 lower level abstraction, we have construct. Which is kind of like a more refined   00:01:20.670 --> 00:01:24.840 version of a concept according to his  definition, there are multiple definitions.  00:01:24.840 --> 00:01:28.920 And then we have some variables or  events that we actually observe. 00:01:28.920 --> 00:01:36.090 Then we have relationships, so we make  claims about these theoretical concepts.  00:01:36.090 --> 00:01:41.010 Their relationships based on relationships  are between the observable variables. 00:01:41.010 --> 00:01:44.160 Then we have some kind of boundary conditions and   00:01:44.160 --> 00:01:47.820 assumptions that are required  for this theory to hold.  00:01:47.820 --> 00:01:53.850 Or we could say that our theory of the  CEO-gender causing companies to be more   00:01:53.850 --> 00:02:00.360 profitable is constrained to only countries  that are similar to Finland for example. 00:02:00.360 --> 00:02:06.090 To understand the recipe in theory and  data let's look at the Deephouse's paper. 00:02:06.090 --> 00:02:10.650 So Deephouse's theory is an  elaborate description of why   00:02:10.650 --> 00:02:14.310 firms that differentiate  should be more profitable.  00:02:14.310 --> 00:02:20.430 Or why too much differentiation  should lead to less profitability. 00:02:20.430 --> 00:02:26.490 And their data is from banks. They have 159 banks.  00:02:26.490 --> 00:02:29.700 They have return on assets  as the dependent variable.  00:02:29.700 --> 00:02:34.980 And then sum of deviations of  proportion of 11 asset categories. 00:02:34.980 --> 00:02:40.350 Basically what that means is that they  calculated what kind of assets the banks held.  00:02:40.350 --> 00:02:46.350 And then they compared how much different a  particular bank was from the industry mean.  00:02:46.350 --> 00:02:49.470 And that was their measures  of strategic deviation. 00:02:49.470 --> 00:02:52.740 Then they found out the negative correlation. 00:02:52.740 --> 00:02:56.700 So how is the relationship, how does it work. 00:02:56.700 --> 00:02:59.820 Can we based on this small correlation   00:02:59.820 --> 00:03:07.650 come up with this elaborate theory? - Of how the strategic deviation or   00:03:07.650 --> 00:03:11.910 differentiation is related to profitability. Or is it the other way around.  00:03:11.910 --> 00:03:19.470 Are we using this theory to come up with some  kind of data that we use to justify the theory. 00:03:19.470 --> 00:03:21.600 The second option is correct.  00:03:21.600 --> 00:03:28.140 You cannot based on this correlation only  come up with this kind of elaborate theory.  00:03:28.140 --> 00:03:32.550 Because the correlation only tells that  there is a statistical association.  00:03:32.550 --> 00:03:35.640 It doesn't tell whether it's  a causal association or not.  00:03:35.640 --> 00:03:42.060 And particularly it doesn't answer the important  question of why these two variables are related. 00:03:42.060 --> 00:03:47.670 So theory typically needs to come  from some place else than the numbers. 00:03:47.670 --> 00:03:53.100 To understand further the relation  between theory and data we need   00:03:53.100 --> 00:03:57.060 to look a bit about the types of  reasoning that we do in science. 00:03:57.060 --> 00:04:04.410 So inductive and deductive reasoning are the  two most common forms of reasoning in science. 00:04:05.430 --> 00:04:09.090 This is a classical example  of induction and deduction. 00:04:09.090 --> 00:04:11.880 So we have major premise all men are mortal.  00:04:11.880 --> 00:04:16.740 Minor premise Socrates is a man. And then conclusion that Socrates is mortal. 00:04:16.740 --> 00:04:23.190 And deductive and inductive  reasoning are referred to knowing   00:04:23.190 --> 00:04:29.260 two of these and then coming up with the third. How do we infer from two of these to the third? 00:04:29.260 --> 00:04:33.760 In deductive reasoning we  know that all men are mortal,   00:04:33.760 --> 00:04:37.660 we know that Socrates is a  man and then we infer that. 00:04:37.660 --> 00:04:43.780 Because these two major premise and minor  premise are correct then Socrates must be mortal. 00:04:43.780 --> 00:04:49.420 So if these two claims are true then  the conclusion also must be true.  00:04:49.420 --> 00:04:53.140 So deductive reasoning maintains the truth value. 00:04:53.140 --> 00:04:59.350 So if this is not true then it also means  that either one of these must be false. 00:04:59.350 --> 00:05:07.360 So for example if we observe the Socrates is a  man then we observe the Socrates is not mortal. 00:05:07.360 --> 00:05:13.330 Then our assumed rule that all  men are mortal is incorrect.  00:05:13.330 --> 00:05:15.940 We can clearly say that if Socrates is a man,   00:05:15.940 --> 00:05:19.510 Socrates is not mortal. Then all men are not mortal. 00:05:20.260 --> 00:05:23.740 We can rule out, we can refute claims using   00:05:23.740 --> 00:05:28.900 deductive reasoning by observing something  that differs from the expected conclusion. 00:05:28.900 --> 00:05:32.830 Inductive reasoning are  works the other way around. 00:05:32.830 --> 00:05:35.800 We observe that Socrates is a man,   00:05:35.800 --> 00:05:42.100 we observe that Socrates is mortal and then  we infer from that, that all men are mortal. 00:05:43.210 --> 00:05:48.640 We go from a specific case to a generalization. That's inductive reasoning. 00:05:48.640 --> 00:05:55.750 Of course just by observing that one man is  mortal doesn't mean at all men are mortal. 00:05:56.740 --> 00:06:03.280 The problem induction is that even if we  observe things a thousand times we can't   00:06:03.280 --> 00:06:10.000 guarantee that we would observe  the same thing in the 1001 case. 00:06:10.000 --> 00:06:16.390 For example even if all the Swans that you have  seen in your life have been white this far,   00:06:16.390 --> 00:06:18.130 it doesn't mean that all Swans are white. 00:06:18.130 --> 00:06:24.580 There actually are Black Swan in Australia so  inductive reasoning is not guaranteed to work. 00:06:24.580 --> 00:06:29.110 There is some debate in philosophy or  science and history of philosophy of   00:06:29.110 --> 00:06:32.920 science about induction but the general  understanding is that the inductive   00:06:32.920 --> 00:06:38.800 reasoning is useful although it doesn't  lead always to the correct conclusion. 00:06:39.790 --> 00:06:43.990 Then for completeness I'll also  explain abductive reasoning. 00:06:43.990 --> 00:06:48.730 So abductive reasoning is the case  that we know that Socrates is mortal,   00:06:48.730 --> 00:06:55.360 we know that all men are mortal and in  abductive reasoning we infer the minor premise. 00:06:55.360 --> 00:06:59.740 So abductive reasoning is  reasoning to the best explanation. 00:07:00.550 --> 00:07:04.450 We are answering the question that  we observe that Socrates is mortal.  00:07:04.450 --> 00:07:09.670 Why could that be the case? We infer that maybe Socrates is a man. 00:07:09.670 --> 00:07:14.920 This is even weaker than induction  and it's not very commonly used.  00:07:14.920 --> 00:07:20.620 Except for generating hypotheses in  which everything is allowed basically.  00:07:20.620 --> 00:07:22.840 As I will explain in a few slides from now. 00:07:24.280 --> 00:07:26.920 We'll be focusing on inductive  and deductive reasoning.  00:07:26.920 --> 00:07:34.000 And let's take a look at the diagram that  I showed you in few slides from before. 00:07:34.000 --> 00:07:39.910 We have different levels of abstraction  so we have the theoretical concepts. 00:07:39.910 --> 00:07:42.430 We typically want to say  something about these concepts. 00:07:42.430 --> 00:07:47.920 And we can use the term construct  to refer to a theoretical concept. 00:07:47.920 --> 00:07:51.100 Then we have propositions so propositions are   00:07:51.100 --> 00:07:54.190 claims about the relationship  between theoretical concepts. 00:07:54.190 --> 00:08:00.280 For example we can say that companies CEOs  gender causes the profitability differences. 00:08:00.280 --> 00:08:04.600 Profitability difference is  the theoretical concept and   00:08:04.600 --> 00:08:06.910 then CEO-gender is also a theoretical concept. 00:08:06.910 --> 00:08:12.490 And the relationship in these is proposition  and the proposition is a part of a theory. 00:08:12.490 --> 00:08:16.300 It's kind of like a statement that  summarizes the main claim of the theory. 00:08:16.300 --> 00:08:20.770 Then we have empirical concepts. So these are some things that we   00:08:20.770 --> 00:08:24.730 can actually directly measure. And we have a hypothesis. 00:08:25.780 --> 00:08:32.410 The idea is that if we have a theoretical concept  and other theoretical concept that are related.  00:08:32.410 --> 00:08:36.700 Then we should have also some kind of measures of   00:08:36.700 --> 00:08:41.470 those theoretical concepts that are  related according to a hypothesis. 00:08:41.470 --> 00:08:44.080 So we make a statistical  hypothesis that we actually test. 00:08:44.080 --> 00:08:49.300 For example we could say  that a doctor's assessment   00:08:49.300 --> 00:08:51.850 on the CEO-gender which is a directly observable.  00:08:51.850 --> 00:08:59.410 And return on assets from the the profitability  report to tax authorities which is directly   00:08:59.410 --> 00:09:04.510 observable are related if there is a  relationship CEO-gender and profitability. 00:09:04.510 --> 00:09:06.100 That is our hypothesis. 00:09:06.100 --> 00:09:08.350 Then we collect actual data. 00:09:08.350 --> 00:09:12.730 So we have actual observations for actual  companies on things that we can observe. 00:09:12.730 --> 00:09:16.090 And we test whether there's  a statistical association. 00:09:16.090 --> 00:09:19.930 If there is a statistical association then we can   00:09:19.930 --> 00:09:21.970 conclude that hypothesis and  proposition are supported. 00:09:21.970 --> 00:09:26.470 So how does induction and deduction  work in this kind of framework. 00:09:26.470 --> 00:09:30.490 The idea of induction is that we  go from a statistical association.  00:09:30.490 --> 00:09:38.050 And we say that because two things are correlated  then there must be a theoretical relationship. 00:09:38.830 --> 00:09:45.100 We observe a correlation with CEO-gender and  profitability we infer that the CEO-gender   00:09:45.100 --> 00:09:49.540 or gender differences is the cause  of the profitability differences. 00:09:49.540 --> 00:09:52.780 So that's, we go from a specific observation to a   00:09:52.780 --> 00:09:56.620 general theory that's inductive  reasoning in research context. 00:09:56.620 --> 00:10:04.480 Then in deductive reasoning  we have theoretical concept. 00:10:04.480 --> 00:10:07.930 We define that if these  theoretical concepts are related,   00:10:07.930 --> 00:10:11.620 then also the empirical concept should be related.  00:10:11.620 --> 00:10:15.190 And then also measurement  results should be related. 00:10:15.190 --> 00:10:22.870 If we observe the measurement result is not  what we were expecting based on deduction,   00:10:22.870 --> 00:10:26.800 then the theoretical relationship is incorrect. 00:10:26.800 --> 00:10:29.050 So that's how deduction works. 00:10:29.050 --> 00:10:35.440 We infer what we should observe and if we don't,  then we say that the proposition is not correct. 00:10:35.440 --> 00:10:38.800 Of course there are many different  ways of how this can fail. 00:10:38.800 --> 00:10:43.660 But that's the dominant way of doing  research is the deductive approach. 00:10:43.660 --> 00:10:47.650 So we go from theory to measurement results and   00:10:47.650 --> 00:10:51.040 then based on the measurement results  we say something about the theory. 00:10:51.040 --> 00:10:52.210 So that's a general idea. 00:10:52.210 --> 00:10:56.140 So how is this justified  and is it ever justifiable   00:10:56.140 --> 00:10:57.940 to do induction in quantitative research. 00:10:57.940 --> 00:11:04.420 Some people say that the inductive our reasoning  is not justified but it's not so straightforward. 00:11:05.080 --> 00:11:07.660 Let's take a look at the Deephouse paper again. 00:11:07.660 --> 00:11:10.390 So they have our proposition. 00:11:10.390 --> 00:11:15.370 And the proposition that we'll be  focusing is that moderate amounts   00:11:15.370 --> 00:11:17.380 of strategic similarity increases performance. 00:11:17.380 --> 00:11:21.400 Then they have a statistical  hypothesis the idea here is   00:11:21.400 --> 00:11:24.910 that if the theoretical proposition is correct. 00:11:24.910 --> 00:11:30.100 Then we should observe that there's a  curvilinear concave down or a u-shape that   00:11:30.100 --> 00:11:35.740 first goes up and then goes down relationship  with strategic deviation and return on assets. 00:11:35.740 --> 00:11:38.560 So these are two variables that they can observe   00:11:38.560 --> 00:11:42.340 directly and this is used as  a test for the proposition. 00:11:42.340 --> 00:11:47.230 And then we have finally the  test with data so they calculate.  00:11:47.230 --> 00:11:50.140 They have some data from Call reports and   00:11:50.140 --> 00:11:54.010 then they calculate the deviation  and they run a regression model. 00:11:54.010 --> 00:12:00.190 And the regression findings can be used  to test these statistical hypotheses. 00:12:00.190 --> 00:12:03.760 And it was supported in the paper therefore   00:12:03.760 --> 00:12:07.150 they conclude that maybe the  proposition is true as well. 00:12:08.140 --> 00:12:10.780 The idea of proposition is  that is a theoretical claim. 00:12:10.780 --> 00:12:16.600 Then we have a statistical claim  derived by from the theoretical claim. 00:12:16.600 --> 00:12:23.470 And finally we have some calculations that  actually test the statistical hypothesis. 00:12:23.470 --> 00:12:30.610 Now the million-dollar question is: can  we infer causal theory from a correlation. 00:12:30.610 --> 00:12:37.390 Let's take a look at some correlation examples  to see why that could or could not be the case. 00:12:37.390 --> 00:12:40.660 So these are correlations  from actual observed data. 00:12:40.660 --> 00:12:45.580 We can see that the correlation  between our US spending on science,   00:12:45.580 --> 00:12:49.690 space and technology is correlated almost  perfectly with suicides by hanging,   00:12:49.690 --> 00:12:53.440 strangulation and suffocation  on for a 10-year period. 00:12:53.440 --> 00:13:02.860 Can we make a claim that our if US increases  spending then suicides and hanging will increase. 00:13:02.860 --> 00:13:09.700 Or can we say that our US should decrease science  spending so that less people who make suicides. 00:13:09.700 --> 00:13:11.560 That's an implausible claim. 00:13:11.560 --> 00:13:13.540 So that's unlikely to be true. 00:13:13.540 --> 00:13:19.870 And the reason why there's a correlation  is that there could be a common cause.  00:13:20.420 --> 00:13:25.340 For example it could be that the population  is growing and when there is more population,   00:13:25.340 --> 00:13:30.260 there's more tax dollars, there is more  spending because there's more tax dollars.  00:13:30.260 --> 00:13:32.510 And also because there's more  people there's more suicides. 00:13:32.510 --> 00:13:36.320 Or this could be because of  the state of the economy.  00:13:36.320 --> 00:13:38.810 Let's say that the economy is growing and both   00:13:38.810 --> 00:13:41.090 of these are growing as the  economy grows or so forth. 00:13:41.090 --> 00:13:45.710 Or it could be just by chance only. Ten observations, ten years is not that much. 00:13:45.710 --> 00:13:52.010 So if you have large data sets and you keep  on data mining those data sets like this   00:13:52.010 --> 00:13:56.840 Tyler Wiggins who've made these graphs. You will find some large correlations. 00:13:56.840 --> 00:14:00.530 For example number of people who  drowned by falling into a pool   00:14:00.530 --> 00:14:06.350 and films where Nicholas Cage  was the star also correlated. 00:14:06.350 --> 00:14:11.270 So claiming causality from this  correlation would be ridiculous. 00:14:12.620 --> 00:14:14.840 Can we ever make a claim based on a correlation. 00:14:16.580 --> 00:14:21.110 To understand we have to understand what is  the hypothetic deductive method in science. 00:14:21.110 --> 00:14:27.050 The hypothetic deductive methods  differentiates between two contexts. 00:14:27.050 --> 00:14:32.390 First we have the context of discovery. How we come up with new theories. 00:14:32.390 --> 00:14:39.620 And the context of discovery is how  we generate theoretical hypotheses. 00:14:39.620 --> 00:14:44.810 So in hypothetical deductive method. Their starting point is a hypothesis. 00:14:44.810 --> 00:14:47.480 It's a guess of what the result could be. 00:14:47.480 --> 00:14:52.640 So it's a guess that maybe the  US spending on space and science   00:14:52.640 --> 00:14:57.830 actually causes deaths by suffocation and hanging. 00:14:57.830 --> 00:15:03.770 So it's simply a guess and it doesn't  really matter how we come up with a guess. 00:15:03.770 --> 00:15:10.520 Then we have the context of justification. So how do we justify this claim that spending   00:15:10.520 --> 00:15:16.670 on space and science actually increases  deaths by suffocation and hanging. 00:15:18.680 --> 00:15:25.610 In hypothetic deductive reasoning or  research logic we apply deductive reasoning. 00:15:25.610 --> 00:15:29.090 We assume that the hypothesis is true.  00:15:29.090 --> 00:15:40.070 Then we assume some other auxiliary hypothesis  and we then deduce what things we should   00:15:40.070 --> 00:15:45.560 observe if this main hypothesis and  the auxiliary hypothesis are true. 00:15:48.980 --> 00:15:54.230 Then these observable consequences in a research   00:15:54.230 --> 00:15:57.680 paper are typically presented  as statistical hypothesis. 00:15:57.680 --> 00:16:02.750 So whereas hypothetic deductive reasoning or  the logic of hypothetic deductive research,   00:16:02.750 --> 00:16:07.220 doesn't really say that the  observable consequences must be   00:16:07.220 --> 00:16:11.150 presented at statistical hypothesis. That is how it's commonly done. 00:16:12.800 --> 00:16:19.460 One a bit confusing thing here is that  the hypothetical deductive method,   00:16:19.460 --> 00:16:22.550 the hypothesis is the theoretical claim. 00:16:22.550 --> 00:16:24.980 So that's a hypothesis we  think it could be correct. 00:16:24.980 --> 00:16:29.030 But in practice when we apply  hypothetic deductive reasoning   00:16:29.030 --> 00:16:32.780 we use the term hypothesis for  the observable consequence. 00:16:32.780 --> 00:16:36.260 And that can cause some problems, some confusion. 00:16:37.700 --> 00:16:44.360 We have a hypothesis that we should observe if the  theory is correct and if some auxiliary hypothesis   00:16:44.360 --> 00:16:49.790 that we'll cover later are correct as well. Then we can we can test with data. 00:16:49.790 --> 00:16:59.060 So if our statistical hypothesis is not supported. We don't observe the predicted observation.  00:16:59.060 --> 00:17:06.530 Then we infer that the theory or the actual  initial hypothesis must have been incorrect. 00:17:06.530 --> 00:17:14.180 It's of course possible also that the auxiliary  hypothesis, one of them is not correct. 00:17:14.180 --> 00:17:19.640 We don't know, that is that, we don't know which  of the hypothesis is incorrect, is referred to us   00:17:19.640 --> 00:17:25.340 under the termination problem of science. But that's how we do. 00:17:25.340 --> 00:17:30.050 So we don't observe something we infer  that the theory is probably incorrect. 00:17:30.050 --> 00:17:38.990 On the other hand if we observe the deduced  consequence of the theory then we can claim   00:17:38.990 --> 00:17:42.680 that the theoretical proposition could be correct. 00:17:42.680 --> 00:17:49.100 We can't claim that it's definitely correct  because deduction doesn't work that way. 00:17:49.100 --> 00:17:53.960 We can only refute theories we can't  support theories using deductive reasoning. 00:17:53.960 --> 00:17:57.140 The way the support for a theory, it comes,   00:17:57.140 --> 00:18:05.480 is when it has been tested over and over. And it has survived many severe tests.  00:18:05.480 --> 00:18:11.450 Then we can say that if this theory can  be challenged then it's probably true. 00:18:11.450 --> 00:18:15.530 So how it relates to the previous figure. 00:18:17.300 --> 00:18:20.030 We have our induction and we have deduction. 00:18:20.030 --> 00:18:26.540 We can actually apply inductive reasoning  but only for the context of discovery. 00:18:26.540 --> 00:18:31.040 So we can make theoretical  claims based on correlation.  00:18:31.040 --> 00:18:33.740 We can see that there is a correlation with the US   00:18:33.740 --> 00:18:42.410 spending on science and technology and  then deaths by suicides on hanging. 00:18:42.410 --> 00:18:50.960 But we can't use the correlation that we used  to make a initial guess or initial hypothesis.  00:18:50.960 --> 00:18:53.570 We can't use that to justify the claim. 00:18:53.570 --> 00:18:57.200 So of course, where the initial  theory comes from it doesn't   00:18:57.200 --> 00:19:04.070 matter. What matters is whether we  can justify the theory empirically. 00:19:04.070 --> 00:19:08.180 And inductive reasoning cannot  really be used in the context   00:19:08.180 --> 00:19:10.070 of justification in quantitative research. 00:19:10.070 --> 00:19:18.020 It would be very unlikely that your observed  correlation, that you happen to observe,   00:19:18.020 --> 00:19:24.110 actually fails the requirements that  we need for making causal claims,   00:19:24.110 --> 00:19:32.060 without it being a result from a research design,  that was specifically designed to do that. 00:19:32.060 --> 00:19:36.170 So in deductive logic can be used to justify   00:19:36.170 --> 00:19:40.310 claims an inductive logic can be  used to come up with the claims. 00:19:40.310 --> 00:19:45.290 But of course when you come up with a claim you  also have to come up with the justification. 00:19:45.290 --> 00:19:49.910 Unless it's a paper where you  just present a claim for example   00:19:49.910 --> 00:19:54.470 papers in Academic Management Review only  present claims and no empirical evidence. 00:19:54.470 --> 00:20:00.650 Of course in the context of  justification we have to have   00:20:00.650 --> 00:20:02.660 auxiliary hypothesis that are true as well. 00:20:02.660 --> 00:20:05.840 So here we would have to have  the auxiliary hypothesis.  00:20:05.840 --> 00:20:13.190 That this empirical concept, let's call it  return on assets data from the trade register.  00:20:13.190 --> 00:20:18.620 And theoretical cons of performance. We have to have the assumption that   00:20:18.620 --> 00:20:22.430 this empirical concept is a valid  measure of the theoretical concept. 00:20:22.430 --> 00:20:27.890 And also that this measurement result here is  reliable measure of this empirical concept. 00:20:27.890 --> 00:20:33.530 And that there are no other causes that  cause this statistical association. 00:20:33.530 --> 00:20:37.040 We have to have these many many  different auxiliary hypotheses.   00:20:37.040 --> 00:20:42.170 If we don't observe something it could be  that the initial proposition is incorrect. 00:20:42.170 --> 00:20:44.600 Or it could be that the auxiliary  hypothesis is incorrect. 00:20:44.600 --> 00:20:45.920 We don't know which one it is. 00:20:45.920 --> 00:20:51.590 We conveniently assume that it's always the  proposition but that's not always the case. 00:20:54.410 --> 00:21:00.470 Finally, you can make claims based  on statistics on correlation. 00:21:00.470 --> 00:21:04.070 So you can make a causal  claim based on a correlation.  00:21:04.070 --> 00:21:09.260 But you cannot justify a causal  claim only by a correlation. 00:21:09.980 --> 00:21:13.430 Sometimes you find interesting  correlations in your data,   00:21:13.430 --> 00:21:18.020 then you start to think why there is a  correlation you come up with a theory. 00:21:18.020 --> 00:21:24.560 That's completely ok as long as you don't  present that correlation that made you   00:21:24.560 --> 00:21:28.640 come up with the theory as an evidence  for the justification of the theory. 00:21:28.640 --> 00:21:34.760 So you have to keep the context of discovery  and the context of justification separated.