WEBVTT 00:00:00.090 --> 00:00:04.980 Qualitative data analysis differs  from quantitative data analysis. 00:00:04.980 --> 00:00:10.320 So when you do qualitative  analysis, the key problem is 00:00:10.320 --> 00:00:15.570 that it is more of an artistic  challenge than an engineering challenge. 00:00:15.570 --> 00:00:20.130 So in quantitative research, when  you want to make a causal claim 00:00:20.130 --> 00:00:25.380 after you have collected your data on your  association, on your contour variables, 00:00:25.380 --> 00:00:31.140 then you simply apply a regression  analysis or one of its variants. 00:00:31.140 --> 00:00:34.860 Then the choice of which variant to apply and, 00:00:34.860 --> 00:00:37.860 how to do it basically  boils down to understanding, 00:00:37.860 --> 00:00:43.380 what these different variants do, and then  being able to execute that on a computer. 00:00:43.380 --> 00:00:45.720 So it's basically an engineering problem. 00:00:45.720 --> 00:00:50.280 So you have a well-defined analysis  problem, you have a set of tools 00:00:50.280 --> 00:00:55.170 and then you need to understand what  those tools do and be able to use those tools. 00:00:55.170 --> 00:01:00.390 Then you pick the right tool and you apply  the tool according to the instructions. 00:01:00.390 --> 00:01:02.400 So this is what the engineers do. 00:01:02.400 --> 00:01:05.310 In qualitative data analysis on the hand, 00:01:05.310 --> 00:01:08.580 it's much more about making sense of the data. 00:01:08.580 --> 00:01:12.810 So you have text, video, photographs, 00:01:12.810 --> 00:01:16.560 maybe some numbers and you  need to make sense of that 00:01:16.560 --> 00:01:22.380 and try to get to the causal processes  that operate underlying your data, 00:01:22.380 --> 00:01:28.800 and then explain those processes in a way  that is clear and convincing to your readers. 00:01:28.800 --> 00:01:31.410 This is much more of an artistic challenge, 00:01:31.410 --> 00:01:36.630 and there are no ways that  you can go follow stepwise, 00:01:36.630 --> 00:01:40.800 and then arrive into some kind of good result. 00:01:41.670 --> 00:01:43.650 In contrast to quantitative research, 00:01:43.650 --> 00:01:46.920 where generally no one is going to criticize you 00:01:46.920 --> 00:01:50.970 heavily if you just apply regression  analysis and do it correctly. 00:01:50.970 --> 00:01:54.810 So how do we actually do qualitative analysis. 00:01:56.040 --> 00:01:59.520 First we need to take a look  at the research process. 00:01:59.520 --> 00:02:03.180 So this is the normal process  according to Singleton and Straits, 00:02:03.180 --> 00:02:07.440 which is used for quantitative data  analysis or quantitative studies. 00:02:07.440 --> 00:02:10.950 So you start with research  topic, then you have a question, 00:02:10.950 --> 00:02:14.070 that is a more specific thing than a topic. 00:02:14.070 --> 00:02:18.480 Then you prepare your research  design, you design the study. 00:02:18.480 --> 00:02:22.020 And here you have two important  things, you need to decide, 00:02:22.020 --> 00:02:26.100 what to measure and you need to  decide what units to measure. 00:02:26.100 --> 00:02:29.490 So this is a variable, this is  our cases, we need to think, 00:02:29.490 --> 00:02:33.510 what variables and what cases,  where do we get those cases. 00:02:33.510 --> 00:02:36.510 Then you collect the data, you process the   00:02:36.510 --> 00:02:39.810 data and then you analyze the  data and interpret the result. 00:02:39.810 --> 00:02:41.010 So that's fairly straightforward. 00:02:41.010 --> 00:02:44.880 This is, if you ever have read  about software engineering, 00:02:44.880 --> 00:02:48.630 this is kind of like a waterfall  process that just goes from up to down. 00:02:48.630 --> 00:02:54.570 Qualitative data analysis is, or  qualitative research, is quite different, 00:02:54.570 --> 00:02:59.640 because in qualitative research we  typically observe processes over time, 00:02:59.640 --> 00:03:01.680 or if we do a retrospective study,   00:03:01.680 --> 00:03:06.870 we have a chance of going back to those  companies and ask for more information, 00:03:06.870 --> 00:03:09.960 Or go back to people and ask for more information. 00:03:09.960 --> 00:03:12.570 Qualitative data analysis looks more like this. 00:03:12.570 --> 00:03:18.420 So we have still a research topic, we have  research questions and we have research design. 00:03:18.420 --> 00:03:21.780 And then we have some idea, 00:03:21.780 --> 00:03:26.970 initial idea of where do we go for  data, do we study organizations, 00:03:26.970 --> 00:03:28.950 which organization we study, 00:03:28.950 --> 00:03:33.150 and we also have some idea on  what we are going to measure. 00:03:33.150 --> 00:03:38.370 So let's say that we do a qualitative  data study based on interviews. 00:03:39.330 --> 00:03:43.110 To get started with an interview, for  the first interview you need to know, 00:03:43.110 --> 00:03:49.260 who you interview, and then you need to have  some questions that you ask from your interviewee. 00:03:49.260 --> 00:03:51.120 Then we go and collect data. 00:03:51.120 --> 00:03:53.100 But here things differ, 00:03:53.100 --> 00:03:56.820 so whereas in quantitative data  analysis and quantitative projects, 00:03:56.820 --> 00:04:03.870 the measurement, the sampling are basically  decided here and then they are set to stone, 00:04:03.870 --> 00:04:07.620 so you can't really change them afterwards. 00:04:07.620 --> 00:04:09.270 So if you do a survey study, 00:04:09.270 --> 00:04:12.390 for example after you have  sent out the questionnaires, 00:04:12.390 --> 00:04:15.330 you can't add more to those easily, 00:04:15.330 --> 00:04:18.210 particularly if you do paper  and pencil kind of surveys. 00:04:18.210 --> 00:04:23.730 In qualitative data analysis we  typically have a rough idea of 00:04:23.730 --> 00:04:28.260 what we want to study but we don't  have the specific theory in mind. 00:04:28.260 --> 00:04:33.840 So in qualitative data analysis  we might have a rough idea that, 00:04:33.840 --> 00:04:38.670 for example naming a woman as a CEO  causes company to become more profitable. 00:04:38.670 --> 00:04:43.140 So that would be like an  initial hypothesis that we have. 00:04:43.140 --> 00:04:44.880 And then we go and study,   00:04:44.880 --> 00:04:49.740 what do the women do differently from  men to cause the profitability first. 00:04:49.740 --> 00:04:51.390 Typically you need to iterate. 00:04:51.390 --> 00:04:54.630 So we collect some data, then we analyze the data, 00:04:54.630 --> 00:04:59.160 we come up with some initial  theory on what might be going on. 00:04:59.160 --> 00:05:03.240 Then we go around, we realize that,   00:05:03.240 --> 00:05:07.620 well we saw that women are actually  socially more capable than men, 00:05:07.620 --> 00:05:11.280 then we realize that we need more data about   00:05:11.280 --> 00:05:14.460 the social capability of the  women CEOs and the male CEOs. 00:05:14.460 --> 00:05:17.100 Then we go back to the field  and we collect more data, 00:05:17.100 --> 00:05:20.640 so we have this iteration of  analysis and data collection. 00:05:20.640 --> 00:05:23.370 So typically when we have multiple cases, 00:05:23.370 --> 00:05:26.310 let's say we have six cases  in a multiple case study, 00:05:26.310 --> 00:05:30.630 we start the analysis after the first case. 00:05:30.630 --> 00:05:33.630 So after the first interview, we analyze the data, 00:05:33.630 --> 00:05:39.450 we start to think, what could  explain what this person tells me. 00:05:39.450 --> 00:05:43.920 And then we refine the interview protocol, 00:05:43.920 --> 00:05:46.950 we may add more cases to the study and we iterate. 00:05:46.950 --> 00:05:53.730 We go and we have more  measures, we have more cases. 00:05:54.240 --> 00:05:57.120 We could start from, for example, 00:05:57.120 --> 00:06:00.900 four cases and then in the final  study we could have eight cases. 00:06:00.900 --> 00:06:02.340 So where do we stop? 00:06:02.340 --> 00:06:05.370 Because you can always find more companies or 00:06:05.370 --> 00:06:08.280 more people or more whatever you're observing. 00:06:08.280 --> 00:06:13.710 We stop when we realize that the final case 00:06:13.710 --> 00:06:18.660 That we added to our study did not  really give us any more information. 00:06:18.660 --> 00:06:24.690 So our idea of a theory did not  update anymore after adding a case,   00:06:24.690 --> 00:06:27.360 then we know that we have obtained, 00:06:27.360 --> 00:06:29.430 what is called theoretical saturation. 00:06:29.430 --> 00:06:35.250 And then we finish the data collection and  data analysis and then we write our report. 00:06:35.250 --> 00:06:38.250 So to understand qualitative data analysis, 00:06:38.250 --> 00:06:41.160 you need to understand first that  the research process differs. 00:06:41.160 --> 00:06:43.530 Whereas in quantitative data analysis, 00:06:43.530 --> 00:06:49.110 you start with the data collection and  then you proceed down to data analysis, 00:06:49.110 --> 00:06:53.820 you hardly ever go back to data collection,  sometimes you do but that's not very common. 00:06:53.820 --> 00:06:57.180 And then you are you work with what  you have you write your report. 00:06:57.180 --> 00:07:03.960 In qualitative data analysis, the data analysis  and the data collection, they go hand by hand, 00:07:03.960 --> 00:07:09.870 and your data analysis guides your  future data collection efforts, so both, 00:07:09.870 --> 00:07:15.300 the cases and the interview protocol  or observation protocol are updated 00:07:15.300 --> 00:07:18.900 as you get more insights from  the data that you analyze. 00:07:18.900 --> 00:07:22.500 So how do you then actually analyze the data? 00:07:22.500 --> 00:07:25.650 There are different ways to do that. 00:07:25.650 --> 00:07:30.960 Typically we pick one of the  leading scholars for example 00:07:30.960 --> 00:07:37.110 Denny Gioia, Kathleen Eisenhart, Ann  Langley and we follow, what those do. 00:07:37.110 --> 00:07:41.370 All these these approaches  have something in common and 00:07:41.370 --> 00:07:44.880 it is are called qualitative coding. 00:07:44.880 --> 00:07:54.720 So when we have 500 pages of interview  transcripts and 100 pages of field notes 00:07:54.720 --> 00:07:58.200 from our interviews and our  observations from the field. 00:07:58.200 --> 00:08:01.590 We can't publish that for two reasons: 00:08:01.590 --> 00:08:04.740 no one is going to read that, no  one is going to make sense of that, 00:08:04.740 --> 00:08:07.380 and the second reason is that,  it's typically confidential. 00:08:07.380 --> 00:08:11.280 So we need to summarize those 500 pages 00:08:11.280 --> 00:08:15.570 into some insights that our readers can then use, 00:08:15.570 --> 00:08:18.930 or whoever is the consumer  of our research results. 00:08:18.930 --> 00:08:21.600 And qualitative coding is the way, how we do that. 00:08:21.600 --> 00:08:27.420 Understanding qualitative coding is perhaps  easiest to do by looking at an example. 00:08:27.420 --> 00:08:33.300 So this is a random interview of a  random CEO found from the internet and 00:08:33.300 --> 00:08:37.080 I will demonstrate the  principles of qualitative coding, 00:08:37.080 --> 00:08:45.150 by just going through, how I could code  this particular small interview transcript. 00:08:45.150 --> 00:08:49.980 So this is, if I remember correctly,  this is the old CEO of Caterpillar, 00:08:49.980 --> 00:08:55.650 and we are going to code and  extract meanings from what he says. 00:08:55.650 --> 00:08:59.640 So typically you go paragraph or  paragraph, or sentence by sentence, 00:08:59.640 --> 00:09:03.300 and then you start to think,  what does this guy mean. 00:09:03.300 --> 00:09:07.230 So first of all the person says  that this is a great company, 00:09:07.230 --> 00:09:12.960 so we can code that as pride, so  the person is proud of his company. 00:09:12.960 --> 00:09:18.510 Then he explains that the company  has been around for a long time,   00:09:19.170 --> 00:09:21.840 so there's long history, 140 years. 00:09:21.840 --> 00:09:26.910 We don't need to know about the specifics, so  we kind of increase the level of abstraction 00:09:26.910 --> 00:09:30.900 by coding and we seek to extract  meaning from these sentences. 00:09:30.900 --> 00:09:34.290 The next thing that we notice  is that there are bad times. 00:09:34.290 --> 00:09:42.210 Then we go on and we note that there is  trustworthy image that this company has. 00:09:42.210 --> 00:09:48.600 And it is an American company, the person is  proud of the American history of this company. 00:09:48.600 --> 00:09:52.500 And then the company wants  to grow internationally. 00:09:52.500 --> 00:09:59.760 So we go through the data and we mark text,  then we extract meaning from the text. 00:09:59.760 --> 00:10:03.030 And after we have gone through this initial, 00:10:03.030 --> 00:10:07.470 we call this open coding to get the  first order categories from our data, 00:10:07.470 --> 00:10:12.420 we start to think, do these  things have anything in common. 00:10:12.420 --> 00:10:15.750 So are there any second order categories. 00:10:15.750 --> 00:10:21.210 So, for example can we abstract  these first order codes? 00:10:21.210 --> 00:10:24.900 We could for example say that the heritage here, 00:10:24.900 --> 00:10:29.850 American heritage and long history,  they could share something in common. 00:10:29.850 --> 00:10:34.890 So they could indicate that this  company gains its legitimacy 00:10:34.890 --> 00:10:38.550 through this long history of  operating in the American markets. 00:10:38.550 --> 00:10:40.950 Then we could also code abstract categories, 00:10:40.950 --> 00:10:47.100 for example that this company has a brand  advantage because of trust for the image. 00:10:47.100 --> 00:10:51.540 So trust for the image is a more specific code,   00:10:51.540 --> 00:10:54.720 than this more general, that  you have brand advantage. 00:10:54.720 --> 00:11:00.570 Once we have coded or combined these  codes into more abstract categories, 00:11:00.570 --> 00:11:02.880 this is sometimes referred to axial coding. 00:11:02.880 --> 00:11:07.620 Then we start to think, how  do these categories relate, 00:11:07.620 --> 00:11:10.140 and that's part of what we call theorizing. 00:11:10.140 --> 00:11:15.990 So we could for example say that  we have this initial theory that, 00:11:15.990 --> 00:11:24.780 if you have this long legitimacy through  history then that leads to brand advantage. 00:11:24.780 --> 00:11:28.800 So this company has a brand advantage  because of their long history, 00:11:28.800 --> 00:11:35.340 and then we could say that well this works  in the context of international markets 00:11:35.340 --> 00:11:40.320 So this is a bit of a silly  example but that's general idea. 00:11:40.320 --> 00:11:44.640 You extract meaning from text  on these first-order categories, 00:11:44.640 --> 00:11:47.730 then you have these more  abstract second-order categories,   00:11:47.730 --> 00:11:49.620 and then you start to look at relationships. 00:11:49.620 --> 00:11:53.280 Of course, because this is a qualitative study, 00:11:53.280 --> 00:11:56.130 we have to pay attention to two different things. 00:11:56.130 --> 00:12:01.080 One is that this is just an initial idea  and then we have to iterate many many times, 00:12:01.080 --> 00:12:04.290 if we have other cases that we study,  if we have a multiple case study, 00:12:04.290 --> 00:12:09.690 we have to check if the other  cases support this interpretation, 00:12:09.690 --> 00:12:14.070 or if this is something that is just  specific to this case or if this is something 00:12:14.070 --> 00:12:17.250 that we just can't really support empirically. 00:12:17.250 --> 00:12:22.320 Another thing that we need to look  at, is explanation of this process so, 00:12:22.320 --> 00:12:26.970 how exactly would this legitimacy  lead to brand advantage. 00:12:26.970 --> 00:12:28.800 Then we iterate many many many times. 00:12:28.800 --> 00:12:34.860 Nowadays this process is typically  supported by computer software, 00:12:34.860 --> 00:12:37.740 and the computer software has a couple   00:12:37.740 --> 00:12:41.460 of advantages over the older  style of printing things out 00:12:41.460 --> 00:12:44.760 and then coding with a pencil and notebook. 00:12:44.760 --> 00:12:48.990 First of all, you can automate something. 00:12:48.990 --> 00:12:50.370 So if you find that, 00:12:50.370 --> 00:12:54.900 for example innovativeness is  useful construct for your study, 00:12:54.900 --> 00:12:58.500 you can automatically search for  innovation related terms from the data. 00:12:58.500 --> 00:13:04.200 Then it also keeps a track record  of how you infer different things, 00:13:04.200 --> 00:13:08.130 so you can look at your coding  history and then if someone asks you, 00:13:08.130 --> 00:13:10.890 how did you come up with this theory, 00:13:10.890 --> 00:13:15.960 then you can point to specific  instances in your code book. 00:13:15.960 --> 00:13:22.770 And then finally quite often  we have in qualitative studies, 00:13:22.770 --> 00:13:25.170 the results are presented as tables and quotes, 00:13:25.170 --> 00:13:29.310 or we support the results with tables  and quotes that illustrate our data. 00:13:29.310 --> 00:13:35.070 Then qualitative data analysis  software makes that easy to do 00:13:35.070 --> 00:13:40.830 and easier than if you were using just the  old-fashioned pen and a notebook coding.