WEBVTT Kind: captions; Language: fi 1 00:00:05.200 --> 00:00:10.610 we can get started to locate while thank you so much to the language 2 00:00:10.610 --> 00:00:16.330 campus here at the University of Jyväskylä for the opportunity to speak to day 3 00:00:16.330 --> 00:00:21.970 about de mystifying machine translation and improving machine translation literacy 4 00:00:21.970 --> 00:00:26.080 my name is Lynn Bowker and my regular job is as 5 00:00:26.080 --> 00:00:31.380 professor at the school of translation on interpretation the at the university of ottawa 6 00:00:31.380 --> 00:00:36.380 in canada but for the months of november and december i'm being host 7 00:00:36.380 --> 00:00:41.540 of as visiting fellow here at the university of Jyväskylä 8 00:00:41.540 --> 00:00:47.690 so i'm going to divide to day's talk into two main parts 9 00:00:47.690 --> 00:00:52.310 the first thing that i'm going to do is little bit of crash course 10 00:00:52.310 --> 00:00:57.150 to de mystify machine translation and talk little bit about 11 00:00:57.150 --> 00:01:01.750 what is nerle machine translation or how to day driven approaches to machine 12 00:01:01.750 --> 00:01:06.910 translation work ah what is all the rhythmic bias and little bit of human computer 13 00:01:06.910 --> 00:01:11.860 interaction and essentially this is sort of crash course 14 00:01:11.860 --> 00:01:17.330 inrushing translation literacy so thought did at first and then afterwards 15 00:01:17.330 --> 00:01:22.110 out when you understand what coarse and wishing physician literacy might look like portly 16 00:01:22.110 --> 00:01:26.790 stood in o'er sort of bud trimmed down version ah then we can 17 00:01:26.790 --> 00:01:31.790 talk more about how you might like to participate in the teaching of machine translation 18 00:01:31.790 --> 00:01:36.880 received two others whether their your students or whether you work with other groups that might benefit 19 00:01:36.880 --> 00:01:41.670 from machine translation literacy instruction like tiens or 20 00:01:41.670 --> 00:01:46.540 new comers to country or other people that he might encounter 21 00:01:46.540 --> 00:01:52.040 so those will be the too kind of hops of the talk and da we can take questions 22 00:01:52.040 --> 00:01:57.310 after each half of that sort of make sense so that you have questions up popped them in the chat 23 00:01:57.310 --> 00:02:01.790 until we can adjust them at the end of it half of the talk 24 00:02:01.790 --> 00:02:03.630 k. 25 00:02:03.630 --> 00:02:08.330 so for de mystifying machine translation. 26 00:02:08.330 --> 00:02:13.440 ah me when we talk about machine translation where really talking about that 27 00:02:13.440 --> 00:02:18.750 category or type of tool that automatically translates 28 00:02:18.750 --> 00:02:24.060 text from one language such as french into another language such as english 29 00:02:24.060 --> 00:02:28.530 and it's little bit up to you whether you ask thee tool to translate just 30 00:02:28.530 --> 00:02:33.680 word for you whole sentenced may be longer taxed but the point is that 31 00:02:33.680 --> 00:02:38.570 whatever tax you offer to the machine it will translate the whole 32 00:02:38.570 --> 00:02:43.630 thing for you with really know intervention on your part so in that 33 00:02:43.630 --> 00:02:48.700 way it's different from other types of tools that you might have used alike electronic 34 00:02:48.700 --> 00:02:53.520 dictionaries or concord answers or translation memory tools those are all kind of 35 00:02:53.520 --> 00:02:58.520 morn the category of computer assisted translation they do some of the work for you 36 00:02:58.520 --> 00:03:03.710 but there's lot of interaction out between you and the tool machine translation 37 00:03:03.710 --> 00:03:09.110 there can reason interaction usually either before or after the translation process 38 00:03:09.110 --> 00:03:13.600 but the translation process itself is carried out automatically so of 39 00:03:13.600 --> 00:03:18.810 course one of the best known examples of the machine translation tool is google translate is one that probably 40 00:03:18.810 --> 00:03:23.520 almost everyone has heard of so that's the type of tool that were talking about 41 00:03:23.520 --> 00:03:29.010 you might wonder why we talk about machine translation 42 00:03:29.010 --> 00:03:33.630 and not computer translation and this is really just kind of 43 00:03:33.630 --> 00:03:38.570 foss throwback term machine translation technology is much 44 00:03:38.570 --> 00:03:44.060 older than people might realize we've sort of and countered it in our daily lives 45 00:03:44.060 --> 00:03:48.620 only in the past ten to fifteen years since tools like google translate 46 00:03:48.620 --> 00:03:53.620 became free online tools but the technology itself goes back much 47 00:03:53.620 --> 00:03:58.680 further back to the post second world war period ah that's when 48 00:03:58.680 --> 00:04:03.830 computers were just emerging and when people talked about odd 49 00:04:03.830 --> 00:04:08.630 developing applications machine translation was ironically one of the first things that 50 00:04:08.630 --> 00:04:13.530 they tried to do so that at that time you can see that confused her 51 00:04:13.530 --> 00:04:18.710 swear really machines they were gigantic they filled an entire room they had all 52 00:04:18.710 --> 00:04:23.700 kinds of tubes and wires and so at that time we talked about computing 53 00:04:23.700 --> 00:04:28.810 machines and so machine translation is as term really just 54 00:04:28.810 --> 00:04:34.030 kind of throwback to that period even though now of course when we talk about 55 00:04:34.030 --> 00:04:39.520 computing machines we talk about computers but just to make that distinction that am 56 00:04:39.520 --> 00:04:44.040 opened with that machine translation is fully automatic and computer aided 57 00:04:44.040 --> 00:04:49.730 translation is other types of tools that are used by translators but 58 00:04:49.730 --> 00:04:54.130 that require assistance from translators to be used effective 59 00:04:54.130 --> 00:04:55.660 k. 60 00:04:55.660 --> 00:05:00.520 another really important thing to realize what is that machine 61 00:05:00.520 --> 00:05:06.170 translation is category of tools it's not single tool 62 00:05:06.170 --> 00:05:10.060 of course google translate is very well known ah it's very 63 00:05:10.060 --> 00:05:14.950 widely used recent estimate was that their over one billion users of 64 00:05:14.950 --> 00:05:19.620 machine of cool translate machine translation jewel on but of course 65 00:05:19.620 --> 00:05:24.710 it's not the only one there are whole host of others neither 66 00:05:24.710 --> 00:05:29.540 are it is also not comprehensive list but just some examples of other 67 00:05:29.540 --> 00:05:34.890 tools that are in the same category so all of these are machine translation tools 68 00:05:34.890 --> 00:05:39.010 and this is sometimes are worth keeping in mind 69 00:05:39.010 --> 00:05:43.520 because even if our first instinct is to go to gobble 70 00:05:43.520 --> 00:05:48.550 we should realize that just like any other type of product different tools have 71 00:05:48.550 --> 00:05:53.900 different strengths souf we think of other types of products like cars in perhaps 72 00:05:53.900 --> 00:05:58.890 you are fan of nice on or of forward or of 73 00:05:58.890 --> 00:06:03.640 in another brand of car because it has certain features that you like while machine translation is 74 00:06:03.640 --> 00:06:08.930 little bit similar you might find that different tools perform better or worse 75 00:06:08.930 --> 00:06:13.880 depending on the task that you have in mind or the language combination that you were quiff 76 00:06:13.880 --> 00:06:18.830 wore the subject field that your translating so 77 00:06:18.830 --> 00:06:23.930 all of the tools that i'd listed here are free and on wind and your some 78 00:06:23.930 --> 00:06:29.130 other different characteristic so good goal certainly has the widest range of languages 79 00:06:29.130 --> 00:06:33.790 available including quite few last widely used languages 80 00:06:33.790 --> 00:06:38.630 so if you are someone working with language um pet pair that is 81 00:06:38.630 --> 00:06:43.660 not very widely used you might have better chance of finding of that pair handled by google 82 00:06:43.660 --> 00:06:48.250 than by some other tools ah being my christophe translator 83 00:06:48.250 --> 00:06:53.580 is an example that like to point out because it offers different varieties 84 00:06:53.580 --> 00:06:57.850 of some languages so in terms of french for example 85 00:06:57.850 --> 00:07:02.830 you could actually choose to translate into the canadian version of french 86 00:07:02.830 --> 00:07:07.210 or other versions of french so that's kind of um nice option for people 87 00:07:07.210 --> 00:07:11.900 who are users of particular variety of language 88 00:07:11.900 --> 00:07:17.030 deep isle is another tool that is becoming very popular and has possibly 89 00:07:17.030 --> 00:07:21.950 got the highest quality for many western european languages it doesn't do as 90 00:07:21.950 --> 00:07:26.740 many languages as bugle but the once that it does it does quite while so on the whole 91 00:07:26.740 --> 00:07:31.460 deep is known as being high quality tool on papago is 92 00:07:31.460 --> 00:07:36.670 tool that strong for language like korean ca by do for chinese 93 00:07:36.670 --> 00:07:41.260 at yon decks tends to be very strong for eastern european languages so again it 94 00:07:41.260 --> 00:07:46.380 make sense to pick your machine translation tool based on your immediate needs 95 00:07:46.380 --> 00:07:51.210 and to realize that while google might do the job its also not the only 96 00:07:51.210 --> 00:07:56.020 auction out there because think sometimes we forget to look beyond the 97 00:07:56.020 --> 00:08:00.770 obvious has so there are whole host of jewels out there to choose from 98 00:08:00.770 --> 00:08:06.340 saw my adviseth ah with regard to transition tools 99 00:08:06.340 --> 00:08:11.140 is that to out in why not try more than one if you're not happy with the results 100 00:08:11.140 --> 00:08:15.860 that you're getting from particular tool whether it's kugler another tool in 101 00:08:15.860 --> 00:08:20.810 of they are free their easily available so why not try another and 102 00:08:20.810 --> 00:08:25.950 you might be happier with the results that you get from another tool and its also important 103 00:08:25.950 --> 00:08:30.860 to remember that these tools are under active development so they 104 00:08:30.860 --> 00:08:35.850 are changing all the time if your disappointed with the results that you get from 105 00:08:35.850 --> 00:08:40.790 tool on particular occasion don't write it off completely and say why never 106 00:08:40.790 --> 00:08:45.970 guinea used not to again it it did such disastrous job may be the text that you're working 107 00:08:45.970 --> 00:08:50.790 with wasn't just well suited to the language combination offered by the tool 108 00:08:50.790 --> 00:08:56.020 order or the i'm no kind of text type of but each translation 109 00:08:56.020 --> 00:09:00.420 jobs different so don't completely give up on tool because you have one bad 110 00:09:00.420 --> 00:09:05.150 experience so little bit of trial and ere think will help you 111 00:09:05.150 --> 00:09:09.730 figure out how which tool works best for you the tides attacks that you normally work 112 00:09:09.730 --> 00:09:14.540 with and just like any other type of tool om 113 00:09:14.540 --> 00:09:19.350 intellectual it's good for one job might not be the right tool for the next on so get 114 00:09:19.350 --> 00:09:23.960 into the habit of really evaluating every time you have translation task to do 115 00:09:23.960 --> 00:09:28.590 in which till will be best suited for the task at hand 116 00:09:28.590 --> 00:09:31.970 think. 117 00:09:31.970 --> 00:09:37.202 so it kind of raises the question of little bit how does machine translation work 118 00:09:37.202 --> 00:09:42.086 and why don't we get the same results from all tools if i'm here 119 00:09:42.086 --> 00:09:47.318 recommending to you that you should try different tools with the same text that is 120 00:09:47.318 --> 00:09:52.202 because you'll get different results with each told that you try and why is 121 00:09:52.202 --> 00:09:57.085 that what we get different results are from different machine translation systems and the 122 00:09:57.085 --> 00:10:02.317 reason is sort of because how these tools worked to day so in the past 123 00:10:02.317 --> 00:10:07.201 when talked about cha enow just after the second world war when 124 00:10:07.201 --> 00:10:12.433 we were just starting to develop machine translation systems the approach that was taken at 125 00:10:12.433 --> 00:10:17.317 that time was very much based on linguistics thee developers of the tools thought 126 00:10:17.317 --> 00:10:22.200 that if they could build big enough dictionary and if they could and. 127 00:10:22.200 --> 00:10:27.140 hold grammatical rules about how to put different words together 128 00:10:27.140 --> 00:10:31.360 then they would be able to solve the problem of machine translation 129 00:10:31.360 --> 00:10:36.590 so they put their efforts into that about what was called rule based approach where they 130 00:10:36.590 --> 00:10:41.540 were and coating painstakingly all the grammar rules for all the different languages and all 131 00:10:41.540 --> 00:10:46.750 the different combinations of languages and building gigantic dictionaries and 132 00:10:46.750 --> 00:10:51.510 they had limited success but it was definitely limited 133 00:10:51.510 --> 00:10:56.580 we look back now at the quality produced by the early machine transition systems 134 00:10:56.580 --> 00:11:02.260 and we sort of thing while that that sort of disappointing it wasn't that good dame 135 00:11:02.260 --> 00:11:06.600 and its ironic as said that they chose to start with machine 136 00:11:06.600 --> 00:11:11.710 translation as one of the earliest applications of computers because translation is actually very 137 00:11:11.710 --> 00:11:16.820 difficult task on and so that it was 138 00:11:16.820 --> 00:11:21.520 of strange decision to sort of go for one of the hardest tasks rated the beginning 139 00:11:21.520 --> 00:11:26.610 but eventually what they realized was has perhaps there's 140 00:11:26.610 --> 00:11:31.650 better way instead of trying to make computers process 141 00:11:31.650 --> 00:11:37.180 language in way that similar to how people process language maybe we could 142 00:11:37.180 --> 00:11:41.570 ah let the computers do what they're good at instead and so what we 143 00:11:41.570 --> 00:11:46.890 see now is really sharp turned towards him since about the year 144 00:11:46.890 --> 00:11:51.890 now about to turn of the millennium we've seen ah completely different approach 145 00:11:51.890 --> 00:11:57.030 to machine translation instead of looking at linguistic are rules 146 00:11:57.030 --> 00:12:02.200 and and to repositories of of lexicon 147 00:12:02.200 --> 00:12:07.030 we are in an era now where were taking and dated driven approach 148 00:12:07.030 --> 00:12:11.630 to machine translation and it's completely different away 149 00:12:11.630 --> 00:12:18.660 to approach the problem than the real based approach word than how people actually process language 150 00:12:18.660 --> 00:12:24.680 so let's ah look little bit about why the early systems didn't work so well and this is the slide 151 00:12:24.680 --> 00:12:30.340 that my sons as disturbing but hope i'll be able to explain it to you 152 00:12:30.340 --> 00:12:36.240 if we are comparing the way that language processing takes place in people versus computers 153 00:12:36.240 --> 00:12:40.590 one of the really important things that we may to realize is that 154 00:12:40.590 --> 00:12:45.500 people don't just use grammar and dictionaries we actually use all 155 00:12:45.500 --> 00:12:50.570 lot of our real world knowledge to interpret the taxed 156 00:12:50.570 --> 00:12:55.950 out we use context and so here is an essential example 157 00:12:55.950 --> 00:13:00.770 come in french the word awful ca can have to meanings 158 00:13:00.770 --> 00:13:05.770 so it can be translated from french into english either as 159 00:13:05.770 --> 00:13:10.850 lawyer or as avocado so those are to 160 00:13:10.850 --> 00:13:16.030 potential possible and entirely legitimate translations of the word of orca 161 00:13:16.030 --> 00:13:20.650 we can translate it as lawyer or as of cardot so 162 00:13:20.650 --> 00:13:25.750 oh as person have your probably not going to be on 163 00:13:25.750 --> 00:13:30.920 very ah find it very difficult to make the choice right the context will will tell 164 00:13:30.920 --> 00:13:36.050 you pretty clearly weathered the the that correct translation as of cot oak 165 00:13:36.050 --> 00:13:41.090 or whether it's lawyer and we do this by having little bit of real world knowledge like 166 00:13:41.090 --> 00:13:46.070 lawyers are people and we don't eat people or avocado as our 167 00:13:46.070 --> 00:13:50.760 inanimate i'm kind of food and they can't go to 168 00:13:50.760 --> 00:13:55.760 court in make an argument so in we actually taking on will not even 169 00:13:55.760 --> 00:14:01.060 mill second probably to process that information just because we know how the world works 170 00:14:01.060 --> 00:14:06.000 but if we give the same taxed to machine translation system like 171 00:14:06.000 --> 00:14:10.770 monterey animal ca we might naturally gets translation such 172 00:14:10.770 --> 00:14:15.900 as ate lawyer because lawyers in the dictionary and 173 00:14:15.900 --> 00:14:21.160 it doesn't have any real world knowledge so it really um 174 00:14:21.160 --> 00:14:26.180 in kind of hit wall this clothed with call this mountain barrier this rule based approached 175 00:14:26.180 --> 00:14:31.200 machine translation and so now we actually have turned to the whole process 176 00:14:31.200 --> 00:14:35.840 upside down and said computers are not good at real 177 00:14:35.840 --> 00:14:40.840 world knowledge they don't have any and so what can we ask them to do that they 178 00:14:40.840 --> 00:14:46.180 might be good at instead so some of the greatest strengths of computers 179 00:14:46.180 --> 00:14:52.060 ha are not real world knowledge but things like number crunching 180 00:14:52.060 --> 00:14:55.820 and pattern matching and these are actually super powers of 181 00:14:55.820 --> 00:15:00.790 computer stakin process so many numbers so quickly would make 182 00:15:00.790 --> 00:15:06.440 our had spent and they are relentless and finding every instance 183 00:15:06.440 --> 00:15:11.420 of the pattern if you give it gigantic corpus as billions of words than you ask 184 00:15:11.420 --> 00:15:16.760 it to find all the examples of particular word it can do that very quickly in the blink 185 00:15:16.760 --> 00:15:21.820 of been whereas it would take you or years to read through billions of 186 00:15:21.820 --> 00:15:27.100 words of text so we definitely have some strains but computers have their 187 00:15:27.100 --> 00:15:32.370 strength to the point as their very different so we can't approached translation the same way 188 00:15:32.370 --> 00:15:37.410 with computer as we would tackle that task ourselves 189 00:15:37.410 --> 00:15:43.090 so the very hot most recent shank current 190 00:15:43.090 --> 00:15:47.540 approached machine translation it is what we call and data 191 00:15:47.540 --> 00:15:52.780 driven approach so it's playing to those straits of computers its allowing the computers 192 00:15:52.780 --> 00:15:57.620 to have access to enter process large amounts of data because that's 193 00:15:57.620 --> 00:16:02.560 one of the things that computers really good ah so the most current approaches known as 194 00:16:02.560 --> 00:16:07.730 nerle machine translation and uses artificial intelligence 195 00:16:07.730 --> 00:16:12.600 techniques so it has artificial ner on that works against the little diagram here 196 00:16:12.600 --> 00:16:17.890 of narrow network isle which is an network that take them in put 197 00:16:17.890 --> 00:16:23.000 so the light green notes on the left are the in what then in the middle 198 00:16:23.000 --> 00:16:27.590 in in light blue it shows that the computers doing some kind of process 199 00:16:27.590 --> 00:16:32.620 think and it will output something our which shone in yellow on the on the 200 00:16:32.620 --> 00:16:38.240 other side now on this particular example of unknown at work has just one processing 201 00:16:38.240 --> 00:16:43.560 lair in the middle of unknown at work in have many many layers and 202 00:16:43.560 --> 00:16:47.850 it's little bit of black box what happens in there is some kind of mathematical 203 00:16:47.850 --> 00:16:52.750 stuff and it's not essential for us to understand 204 00:16:52.750 --> 00:16:58.250 exactly what's happening just inside the nerle network in order for us to appreciate 205 00:16:58.250 --> 00:17:02.890 some of the more general ways in which machine translation works so don't worry if you don't 206 00:17:02.890 --> 00:17:08.240 understand all of the kind of glory details of nerle network it's enough to realize 207 00:17:08.240 --> 00:17:12.880 that it is indeed driven approach and that it takes in put data in the form of 208 00:17:12.880 --> 00:17:17.840 training data and it will produce sort of proposed translation for you at the 209 00:17:17.840 --> 00:17:23.340 and of the task now the thing that is really important though to understand 210 00:17:23.340 --> 00:17:28.460 is that as input data on the left the 211 00:17:28.460 --> 00:17:33.470 table driven machine translation systems don't just made one or two examples 212 00:17:33.470 --> 00:17:37.930 and they don't as need like hundred or thousand examples in order 213 00:17:37.930 --> 00:17:42.960 to work they really need millions and millions of examples so 214 00:17:42.960 --> 00:17:48.290 were talking about gigantic corpora and at the bottom of the screen 215 00:17:48.290 --> 00:17:53.020 you can see ah sort of ah just ahead of very small 216 00:17:53.020 --> 00:17:58.060 example of law of parallel corpus so it's corpus where we 217 00:17:58.060 --> 00:18:03.180 have text in one language and its translation into another language and 218 00:18:03.180 --> 00:18:08.040 these are aligned soul usually added at the sentence level alms 219 00:18:08.040 --> 00:18:13.510 so that its clear example for the machine that if you encounter text like this 220 00:18:13.510 --> 00:18:18.110 sentence it might look like that in another language so 221 00:18:18.110 --> 00:18:22.970 if we are giving millions and millions of examples to machine 222 00:18:22.970 --> 00:18:28.660 transition system it can learn from them it's sort of artificial learning that it can learn from them 223 00:18:28.660 --> 00:18:33.430 and it can learn what seems to be likely about were reasonable 224 00:18:33.430 --> 00:18:37.880 translation based on all of the examples that essie 225 00:18:37.880 --> 00:18:42.905 ah the challenge with dated driven approaches though is that these approaches are 226 00:18:42.905 --> 00:18:47.930 also sensitive to data and that means that on the one hand yes 227 00:18:47.930 --> 00:18:53.342 we need many many examples but that's not enough we can't just take lots 228 00:18:53.342 --> 00:18:57.980 and lots of taxed growled whatever we find in just throw it 229 00:18:57.980 --> 00:19:03.392 into big pot winnie to put little bit of care into the 230 00:19:03.392 --> 00:19:08.417 choice of the texts because in addition to meeting lots of data because 231 00:19:08.417 --> 00:19:13.442 we meet the right kind if data and in translation this could mean 232 00:19:13.442 --> 00:19:18.080 that you need the right languages obviously if you want to train 233 00:19:18.080 --> 00:19:23.492 machine translation system to translate from finish into english while it doesn't make 234 00:19:23.492 --> 00:19:28.130 lot of sense to to give it information in arabic and. 235 00:19:28.130 --> 00:19:32.710 spanish rates so you have to of course have the right languages but 236 00:19:32.710 --> 00:19:37.790 even more than that you might like to think about the right having the right 237 00:19:37.790 --> 00:19:42.250 subject matter content if you want your system to translate 238 00:19:42.250 --> 00:19:46.220 medical tax than dozen make lot of sense to to train 239 00:19:46.220 --> 00:19:50.630 it with legal texts or some other domain 240 00:19:50.630 --> 00:19:55.860 and so what where finding in this dated driven world of the stated driven 241 00:19:55.860 --> 00:20:00.980 approached machine translation is that finding the right kind of taita 242 00:20:00.980 --> 00:20:06.200 can be bit of challenge and its led us to situation that gets 243 00:20:06.200 --> 00:20:11.430 described as kind of high resorts languages and laura 244 00:20:11.430 --> 00:20:16.850 source language its so high resorts languages ha could mean languages 245 00:20:16.850 --> 00:20:21.710 that are very widely spoken so english is language that this is very widely spoken 246 00:20:21.710 --> 00:20:26.790 around the world so it's not that hard to find data in english 247 00:20:26.790 --> 00:20:32.010 ah if we wanted talk about language pears we might say well english and french is 248 00:20:32.010 --> 00:20:36.870 high resorts language pair because individually both english and french 249 00:20:36.870 --> 00:20:42.390 are widely spoken and there are many contexts where tax or translated 250 00:20:42.390 --> 00:20:47.450 between those languages we think of canada which has english and french as official languages 251 00:20:47.450 --> 00:20:52.150 i'll within the european union english and french both are official language is so 252 00:20:52.150 --> 00:20:57.070 were not challenge to much when it comes to finding resources that 253 00:20:57.070 --> 00:21:01.980 we could use as examples to feed these machine transition systems 254 00:21:01.980 --> 00:21:07.350 when we get into languages that our last widely used so i'm here in finland 255 00:21:07.350 --> 00:21:12.120 at the moment finishes language that is the national language of finland but it's not 256 00:21:12.120 --> 00:21:17.710 widely spoken anywhere else in the world so it's not ah high resorts language 257 00:21:17.710 --> 00:21:22.140 and if we were to think about translating between finished 258 00:21:22.140 --> 00:21:27.350 and another language that's not high highly use them we have 259 00:21:27.350 --> 00:21:32.070 not only to individual languages that her are less widely used but 260 00:21:32.070 --> 00:21:37.300 the translations between them might be difficult to obtain swelled so may be like finishing 261 00:21:37.300 --> 00:21:42.440 greek maybe there's not lot of ah examples available readily available 262 00:21:42.440 --> 00:21:47.340 between those two rank'd so that means that the machine 263 00:21:47.340 --> 00:21:52.090 transition systems for the high resorts languages tend to produce better quality be 264 00:21:52.090 --> 00:21:57.419 because they have more examples to look at and the machine translation between lang which 265 00:21:57.419 --> 00:22:02.392 pierced that are less widely used might have fewer examples to draw from and 266 00:22:02.392 --> 00:22:07.366 then am not reproducing quite as high quality it get even more complicated 267 00:22:07.366 --> 00:22:12.339 than when we won translate very specialize domains because the same thing happens 268 00:22:12.339 --> 00:22:17.313 some domains are very common or we find translation between them soul topics that 269 00:22:17.313 --> 00:22:22.286 are widely talked about in governments tend to have allowed of corpus resources 270 00:22:22.286 --> 00:22:27.260 available so we might see you know in the european union or again the 271 00:22:27.260 --> 00:22:32.233 canadian government whatever the governments are talking about we have examples of what the 272 00:22:32.233 --> 00:22:37.207 government's don't talk about everything where they have their own particular variety of language 273 00:22:37.207 --> 00:22:42.180 to some of your och bureaucratic administrative out language and it may be left. 274 00:22:42.180 --> 00:22:47.840 asked easy to find very technical texts for example alms to put in our corpora 275 00:22:47.840 --> 00:22:52.870 soul when when we're dealing with combination of potentially less widely 276 00:22:52.870 --> 00:22:57.930 used languages and less common domains all of sudden it's quite challenging to come up 277 00:22:57.930 --> 00:23:03.180 with billions of examples and so this has as created situation 278 00:23:03.180 --> 00:23:08.200 where machine translation works better for some languages and topics than for others 279 00:23:08.200 --> 00:23:13.580 something else that we need to keep in mind is this concept 280 00:23:13.580 --> 00:23:18.470 of ah alder rhythmic bias so we might say that 281 00:23:18.470 --> 00:23:23.630 an earl machine translation system is little but like baby when it starts out 282 00:23:23.630 --> 00:23:28.840 like baby it doesn't know anything and what what it's going to learn is 283 00:23:28.840 --> 00:23:33.700 what we teach it so we have to be careful about what we teach it right 284 00:23:33.700 --> 00:23:38.720 just like we have to maybe be careful like we don't swear in front of her children or 285 00:23:38.720 --> 00:23:43.680 alyona we don't to talk about certain topics in front of her children we have to 286 00:23:43.680 --> 00:23:48.660 think about the machine translation system as being assistant it's going to learn 287 00:23:48.660 --> 00:23:54.390 what we teach it so if the training data is not well chosen 288 00:23:54.390 --> 00:23:58.770 if it contains language that we might consider to be sexes store races 289 00:23:58.770 --> 00:24:04.000 then the machine translation system can potentially learned this and replicate 290 00:24:04.000 --> 00:24:08.930 it so i've got some examples and maybe you can tamiya 291 00:24:08.930 --> 00:24:14.060 for if the translation is is correct but am understand that unfinished 292 00:24:14.060 --> 00:24:19.360 theirs as part of gender neutral pronoun so the the pronoun um doesn't 293 00:24:19.360 --> 00:24:24.530 take eye he or she the way that we would have to make that decision in english 294 00:24:24.530 --> 00:24:28.870 so when translating from finished into english the system hoss to 295 00:24:28.870 --> 00:24:34.280 make choice so in day driven system it's going to make choice 296 00:24:34.280 --> 00:24:38.870 by what it usually sees what is more typical what hasn't learned 297 00:24:38.870 --> 00:24:44.280 from the corpus that it has been given and we can see here that it's making very 298 00:24:44.280 --> 00:24:48.840 different choices between he and she depending on what comes now 299 00:24:48.840 --> 00:24:54.150 axed so he is leader hum but she has grandchild 300 00:24:54.150 --> 00:24:59.160 hut he works and she's taking care of the child so it's 301 00:24:59.160 --> 00:25:03.940 turning out language that we you know that its there's no reason that it 302 00:25:03.940 --> 00:25:09.050 should have chosen here she except that thoughts the examples that 303 00:25:09.050 --> 00:25:14.090 it has seen so om if it's problem which 304 00:25:14.090 --> 00:25:19.570 is called elder rhythmic bias or machine bias but the problem is not really 305 00:25:19.570 --> 00:25:24.210 with the tool or the technology its with the data 306 00:25:24.210 --> 00:25:29.460 and the data has been created by people so people are the problem 307 00:25:29.460 --> 00:25:34.340 in the sooner he ho out we are society may be 308 00:25:34.340 --> 00:25:39.150 that doesn't reflected i'd be ideals of abaft having you know 309 00:25:39.150 --> 00:25:44.890 old sexes i'm no races and and because our texts reflect our society 310 00:25:44.890 --> 00:25:50.200 the machine learns from the text that we provide it and so it may thickly means that we need to work 311 00:25:50.200 --> 00:25:55.510 on ourselves the and work on our own on social issues 312 00:25:55.510 --> 00:26:02.230 out but it it underscores the point that it's important to pick the training data carefully 313 00:26:02.230 --> 00:26:06.340 on because the machine will learn what its taught 314 00:26:06.340 --> 00:26:09.630 king. 315 00:26:09.630 --> 00:26:14.870 something else that think we need to ah 316 00:26:14.870 --> 00:26:19.930 be aware of when were using machine translation is but one of our main jobs 317 00:26:19.930 --> 00:26:25.000 is to conduct little bit of risk assessment for when we 318 00:26:25.000 --> 00:26:29.850 decided to use machine transition tool or not we should take 319 00:26:29.850 --> 00:26:34.800 into account the situation of that were in and the reason that were using 320 00:26:34.800 --> 00:26:39.800 the tool because some situations are more or what we might consider high 321 00:26:39.800 --> 00:26:44.890 stakes than others meaning the consequences of getting it wrong could be 322 00:26:44.890 --> 00:26:50.270 quite serious in some cases where as in other contexts is really not to be deal 323 00:26:50.270 --> 00:26:55.070 if the translation isn't so good so we can think of context 324 00:26:55.070 --> 00:26:59.860 such as no visiting doctor for health problem for thee 325 00:26:59.860 --> 00:27:05.660 stakes of having about translation in the context of health care are more serious 326 00:27:05.660 --> 00:27:10.050 than in context such as no translating up 327 00:27:10.050 --> 00:27:15.110 manga comic for entertainment rain if the track desolation of your maga comic 328 00:27:15.110 --> 00:27:20.090 is terrible he might be disappointed but you're not going to you'll suffer 329 00:27:20.090 --> 00:27:25.130 any serious consequences from not so this is something that think 330 00:27:25.130 --> 00:27:30.370 is sa its again it's not really about the tool itself but it's about 331 00:27:30.370 --> 00:27:35.160 our judgment it's about learning to evaluate what's the good 332 00:27:35.160 --> 00:27:40.500 context for using machine translation and what less good context nowt 333 00:27:40.500 --> 00:27:45.520 sometimes of course we also are face to the situation where 334 00:27:45.520 --> 00:27:50.180 well is potentially poor or potentially lesson 335 00:27:50.180 --> 00:27:55.293 perfect translations still better than nor translation at all and that's where we get 336 00:27:55.293 --> 00:28:00.406 into making very difficult position so sometimes we would say let's give machine 337 00:28:00.406 --> 00:28:05.519 translation of try because the alternative is no translation and that certainly not helpful 338 00:28:05.519 --> 00:28:10.632 out all om song in come in and in second to odd 339 00:28:10.632 --> 00:28:15.745 talk about ah another issue witches transparency if we are using machine translation in 340 00:28:15.745 --> 00:28:20.493 situation where were saying we have no choice because the 341 00:28:20.493 --> 00:28:25.606 alternative is nothing it's important think still to social eyes the idea of 342 00:28:25.606 --> 00:28:30.719 her people that it is machine translation because that allows them to assess 343 00:28:30.719 --> 00:28:35.832 their own kind of risk tolerance for trusting the information that thereby that thereby 344 00:28:35.832 --> 00:28:40.580 being presented with so transparency is another kind of issue that goes along. 345 00:28:40.580 --> 00:28:44.450 if the ah the risk assessment idea. 346 00:28:44.450 --> 00:28:48.990 i'll just mention at the at the end of the slide ah 347 00:28:48.990 --> 00:28:53.990 little sort of warning that's when you're using free and online 348 00:28:53.990 --> 00:28:58.060 machine translation tools such as google translate but 349 00:28:58.060 --> 00:29:03.520 the only one when you're using free on mine machine translation usually the terms 350 00:29:03.520 --> 00:29:08.260 and conditions of using that tool include that they may 351 00:29:08.260 --> 00:29:13.620 keep your data and do other things with your data like maybe they'll use it for more training 352 00:29:13.620 --> 00:29:18.620 or enow for some other up purpose that they have aught so it's 353 00:29:18.620 --> 00:29:23.230 good idea not to put confidential information into an on lined machine 354 00:29:23.230 --> 00:29:28.390 transition to make you might think that when you close the window it just disappears 355 00:29:28.390 --> 00:29:33.490 but it doesn't disappear bugle how's your information now and they can use it 356 00:29:33.490 --> 00:29:38.400 according to the terms and conditions that they've laid out which none of us ever read so 357 00:29:38.400 --> 00:29:43.780 we just have to be little bit careful don't put your banking information into an on languishing 358 00:29:43.780 --> 00:29:48.380 transition tool don't put merely personal information that you don't want to be out in the world 359 00:29:48.380 --> 00:29:55.400 because it could it could be out in the world once you've put it in free on languishing translation tall 360 00:29:55.400 --> 00:29:58.090 caney. 361 00:29:58.090 --> 00:30:03.270 wanted to raised the very specific question because we're here 362 00:30:03.270 --> 00:30:07.910 in university context and work at the university context back in canada question 363 00:30:07.910 --> 00:30:12.860 that's getting ocelot is cannot use machine translation for my 364 00:30:12.860 --> 00:30:17.560 course work hut where does not fall in the risk assessment of things 365 00:30:17.560 --> 00:30:22.430 and it's little bit of gray area think on 366 00:30:22.430 --> 00:30:26.730 students are using sheen translation many language teachers wished 367 00:30:26.730 --> 00:30:31.390 that they wouldn't use machine translation but where does the word we kind of lie 368 00:30:31.390 --> 00:30:35.920 on this spectrum and one of the things that i've started to do is sort 369 00:30:35.920 --> 00:30:40.260 of disable what is the learning objective of the course 370 00:30:40.260 --> 00:30:43.130 on all courses have the same learning objectives. 371 00:30:43.130 --> 00:30:48.880 so maybe you are an international student you've calm of from another country to 372 00:30:48.880 --> 00:30:53.920 finland to study and you are here studying science for example 373 00:30:53.920 --> 00:30:58.790 and so the object of of your science course is to learn science 374 00:30:58.790 --> 00:31:03.560 that's the point of your science course try to learn the subject matter of science 375 00:31:03.560 --> 00:31:08.200 and so if you are using machine translation because you have to 376 00:31:08.200 --> 00:31:13.190 hand your work in innes the inn and finish or in english and that's 377 00:31:13.190 --> 00:31:18.470 not the language that you normally speak you might say while i'm using it because 378 00:31:18.470 --> 00:31:23.490 want to annul era want to submit my work to my professor 379 00:31:23.490 --> 00:31:28.250 on i've i've learned the concepts of science and dismayed some help expressing 380 00:31:28.250 --> 00:31:33.210 them so to my mind this is very different objective then being in low 381 00:31:33.210 --> 00:31:40.020 language course where in the language course the objective is to learn the language so maybe french 382 00:31:40.020 --> 00:31:45.770 um and so think that it's you know question of 383 00:31:45.770 --> 00:31:50.150 what is the purpose of the coarse and hurrying on us certainly how does the 384 00:31:50.150 --> 00:31:55.380 teacher feel about it om but in either case think what were doing is drawing 385 00:31:55.380 --> 00:32:00.260 to use machine translation as sort of writing ain't rather than necessarily us 386 00:32:00.260 --> 00:32:05.260 something that will do the whole translation task for us so if you're the science student 387 00:32:05.260 --> 00:32:10.110 and you are just trying to hand in your work in english and demonstrate that you have an understanding of the 388 00:32:10.110 --> 00:32:15.280 concepts one of the things that you'd need to work on is writing your source text 389 00:32:15.280 --> 00:32:20.150 very very well so you're writing the source text in your own language but 390 00:32:20.150 --> 00:32:25.650 you know this isn't an obvious thing to translators but less obvious to people who are translators 391 00:32:25.650 --> 00:32:30.130 if your source text is badly written the translation is going to be disaster current 392 00:32:30.130 --> 00:32:35.060 so you haf to really work on creating good source text in 393 00:32:35.060 --> 00:32:39.960 your own language in order for it to be well translated into another language 394 00:32:39.960 --> 00:32:44.700 so this is actually helping you think to kind of learned the concepts 395 00:32:44.700 --> 00:32:49.200 and learned near the learning objects of the courses to learn science so 396 00:32:49.200 --> 00:32:53.940 you have to learn to write science really well in your own language in order to be able to 397 00:32:53.940 --> 00:32:58.970 benefit from machine translation and hopefully it miching translation is 398 00:32:58.970 --> 00:33:03.530 you know something atoll that you will become last reliant on overtime but don't think 399 00:33:03.530 --> 00:33:08.130 that goes against the objectives of the course in language learning 400 00:33:08.130 --> 00:33:13.510 um what usually advice students is to try it yourself first 401 00:33:13.510 --> 00:33:17.870 and use the machine translation more as verification tool sewed see 402 00:33:17.870 --> 00:33:22.080 how you would do it and then you can compare your results to machine translation 403 00:33:22.080 --> 00:33:26.630 twelve and treated as writing aid rather than as you off 404 00:33:26.630 --> 00:33:31.130 loading the entire task on to machine translation system 405 00:33:31.130 --> 00:33:36.520 either way though think it's important to be respect full of your teachers 406 00:33:36.520 --> 00:33:41.150 instructions if they absolutely do not want you to use it then you should respect 407 00:33:41.150 --> 00:33:46.220 what they've asked of you what try to do now is to say to my students i'm going 408 00:33:46.220 --> 00:33:51.140 to let you use machine translation for many different activities because it's 409 00:33:51.140 --> 00:33:56.290 out their professionals or using it seemed silly not to sing silly not to learn how to use 410 00:33:56.290 --> 00:34:01.480 it in work while with it on occasion there may be something want you to learn and would appreciate 411 00:34:01.480 --> 00:34:06.370 that you not use machine translation right away and so think the students are more receptive 412 00:34:06.370 --> 00:34:11.180 to that because it's not outright forbidden and they know that if i'm asking him not to use it 413 00:34:11.180 --> 00:34:16.190 for particular task that have well thought out reason so rather than blanket kind 414 00:34:16.190 --> 00:34:21.150 of policy yes or no it's sort of it depends on what were doing sniffed 415 00:34:21.150 --> 00:34:26.080 i'm but do asked students always to be transparent so if you used machine translation 416 00:34:26.080 --> 00:34:31.100 for your assignment put it tommy tommy which to use tommy which parts of the text 417 00:34:31.100 --> 00:34:35.800 used it for and show me clearly how you were able to improve it 418 00:34:35.800 --> 00:34:40.490 because chances are they can be improved even if you are using it to get the 419 00:34:40.490 --> 00:34:45.310 gist of something omnino chances are that that you as the 420 00:34:45.310 --> 00:34:50.490 translator trainee or language student are going to be able to recommend someways improving 421 00:34:50.490 --> 00:34:56.250 it because in we just do have operate on different novel 422 00:34:56.250 --> 00:34:59.770 and of course if you're using it tom 423 00:34:59.770 --> 00:35:05.510 in an essay to take the ideas that you found in another paper 424 00:35:05.510 --> 00:35:09.960 on used only to sight those ideas even if your changing the language in which there being 425 00:35:09.960 --> 00:35:15.800 expressed translating something doesn't absolves you of the need to attribute properly 426 00:35:15.800 --> 00:35:20.040 ah through reverencing him citation so those are some questions that have come 427 00:35:20.040 --> 00:35:25.450 up in in art my kind of language teaching environment hands so obvious 428 00:35:25.450 --> 00:35:30.750 that sort of the vice that i'm giving that will be interesting to have discussion 429 00:35:30.750 --> 00:35:35.320 and see what others are doing or what other ideas people have alm mostly 430 00:35:35.320 --> 00:35:40.230 though would say that you know forbidding it doesn't work calm in fact it soared makes it 431 00:35:40.230 --> 00:35:45.710 more enticing and so don't think it makes lot of sense to forbid machine translation 432 00:35:45.710 --> 00:35:50.340 at this point its here an it's not going away the students are using it whether we think 433 00:35:50.340 --> 00:35:55.970 they should are not so see my responsibility as helping him use it in smarter way rather 434 00:35:55.970 --> 00:36:01.470 than trying to arm cunnel locked on the use of machine transaction 435 00:36:01.470 --> 00:36:06.170 so that's the end of kind if the crash corson machine 436 00:36:06.170 --> 00:36:10.890 translation literacy part soul ah wonder if it make sense to take 437 00:36:10.890 --> 00:36:14.720 if there's any questions on that part to take them now and then i'll talk little 438 00:36:14.720 --> 00:36:19.130 bit more about my teaching experience in the second part 439 00:36:19.130 --> 00:36:20.930 all that. 440 00:36:20.930 --> 00:36:26.011 ah do we have any idea where good gulf and other ah 441 00:36:26.011 --> 00:36:31.093 machine translation companies get their parallel corpora from yes would say 442 00:36:31.093 --> 00:36:36.174 that they get it from wide variety of sources hum probably 443 00:36:36.174 --> 00:36:41.255 they do lot of what we might describe scrapings from 444 00:36:41.255 --> 00:36:46.337 the internet of anything that is sort of in the free domain 445 00:36:46.337 --> 00:36:51.418 so many of the european union and the canadian government web sights 446 00:36:51.418 --> 00:36:56.499 are multi lingual for example homme you know lots of into international 447 00:36:56.499 --> 00:37:01.581 or multi natural companies would be multi lingual have have their web 448 00:37:01.581 --> 00:37:06.662 sights in different languages so suspect that there's lot of 449 00:37:06.662 --> 00:37:11.320 us scraping of data from ha hum front from existing free. 450 00:37:11.320 --> 00:37:17.190 jane ascites ah but also everything that gets entered into 451 00:37:17.190 --> 00:37:21.930 the google machine translation system and translated is probably captured 452 00:37:21.930 --> 00:37:27.500 and stored and used adda further level of training so 453 00:37:27.500 --> 00:37:32.110 the more we use the systems were actually generating data for them 454 00:37:32.110 --> 00:37:38.000 come and you may have noticed that some of the tools actually give you the option of editing 455 00:37:38.000 --> 00:37:43.230 little bit improving the translation so they love to see that as well as your kind of doing some 456 00:37:43.230 --> 00:37:48.730 work for them in making the the translations better and so got its it gets 457 00:37:48.730 --> 00:37:54.200 little bit of cycle that feeds itself to some extent mlle 458 00:37:54.200 --> 00:37:59.790 for or ah 459 00:37:59.790 --> 00:38:04.790 your views are very marked resource roars out of him who 460 00:38:04.790 --> 00:38:09.280 are flared while the cell exactly yes that's what mean it's 461 00:38:09.280 --> 00:38:14.630 sort of feeding itself little bet on some of the most active in current 462 00:38:14.630 --> 00:38:19.400 area of machine transition researches on what they call so pathetic data 463 00:38:19.400 --> 00:38:24.350 so it's dated at little bit artificial sort could mean that it was sad translated by machine 464 00:38:24.350 --> 00:38:29.840 itself or generated somehow through another type of ai tool so 465 00:38:29.840 --> 00:38:34.690 absolutely yes what were seeing as that machine translation is being trained with 466 00:38:34.690 --> 00:38:39.380 machine translation data gap to some extent boom 467 00:38:39.380 --> 00:38:44.910 with yours or joke he aright so the next part of the taught the 468 00:38:44.910 --> 00:38:50.230 but i'd like to share with you as little bit about my experience of trying 469 00:38:50.230 --> 00:38:55.780 to develop and and teach machine translation literacy to people who are not 470 00:38:55.780 --> 00:39:00.850 translators or even necessarily ah language students or were 471 00:39:00.850 --> 00:39:06.130 having strong background informal language learning 472 00:39:06.130 --> 00:39:07.980 hang me. 473 00:39:07.980 --> 00:39:13.100 ah so when talk about machine translation literacy what 474 00:39:13.100 --> 00:39:18.190 i'm really trying to describe as sort of type of the digital plus 475 00:39:18.190 --> 00:39:23.030 information literacy that focuses on developing critical thinking 476 00:39:23.030 --> 00:39:28.150 skills rather than technical competence ain't so it's not really about 477 00:39:28.150 --> 00:39:33.340 how to use machine translation in the sense of which button to push 478 00:39:33.340 --> 00:39:38.220 because really the technologies super simple right it's it's sort of like copy 479 00:39:38.220 --> 00:39:43.280 paced click and that's all there is to using it technically i'm 480 00:39:43.280 --> 00:39:47.940 like some other take to suffer which are very sophisticated and really require lot of effort 481 00:39:47.940 --> 00:39:53.300 to learn how to use them in gigantic user manual that you have to read machine translation 482 00:39:53.300 --> 00:39:57.860 as user is very simple saw for two years 483 00:39:57.860 --> 00:40:02.890 i'm so it's not about how enow to learn which button to push but it's really 484 00:40:02.890 --> 00:40:08.640 about making decisions such as nor should be using machine translation in this context 485 00:40:08.640 --> 00:40:13.310 what are the consequences of using it in this context alm what 486 00:40:13.310 --> 00:40:19.010 might use instead of rushing translation or no should use machine translation 487 00:40:19.010 --> 00:40:23.130 this time but not next time so it's really about this sort of cognitive 488 00:40:23.130 --> 00:40:28.110 i'm thinking that goes around machine translation use rather than 489 00:40:28.110 --> 00:40:33.210 the technical um side of it we might also think about how 490 00:40:33.210 --> 00:40:38.250 we as people could interact with the machine transition system to improve 491 00:40:38.250 --> 00:40:43.050 the output and as said sometimes that interaction might be before we give tax to 492 00:40:43.050 --> 00:40:48.000 the machine transition sayst sometimes it might be taking the output of the machine transition system 493 00:40:48.000 --> 00:40:53.110 in trying to make it better before we use it for certain purpose so what is our role 494 00:40:53.110 --> 00:40:58.030 in the process and how can we worked together with the machine 495 00:40:58.030 --> 00:41:03.190 and so by asking this type of question we sort of have the potential to become 496 00:41:03.190 --> 00:41:08.230 more informed and more critical users instead of just doing this autopilot 497 00:41:08.230 --> 00:41:13.290 where we copy paced click done that next i'm here 498 00:41:13.290 --> 00:41:18.420 but do have to say that in when first are to think me but wishing transition literacy asserted 499 00:41:18.420 --> 00:41:23.980 thought of it as as one concept but now realized the vastly 500 00:41:23.980 --> 00:41:28.750 different types of users who were out there and the different purposes 501 00:41:28.750 --> 00:41:33.270 that they might have for using machine translation now see machine translation literacy 502 00:41:33.270 --> 00:41:38.470 as much more of kind of custom eyes abul concept language students might need 503 00:41:38.470 --> 00:41:43.320 different information about wishing translation than on in people who are using 504 00:41:43.320 --> 00:41:48.270 it for leisure purposes versus people who are trying to use it in their work placed so 505 00:41:48.270 --> 00:41:53.110 old machine translation literacy just like machine translation use might look different 506 00:41:53.110 --> 00:41:57.700 so i'm talking about my experience to day which is mostly working with 507 00:41:57.700 --> 00:42:02.590 students at universities om but students who are not translation 508 00:42:02.590 --> 00:42:07.040 students of that's my context but you might want to adjust machine translation literacy 509 00:42:07.040 --> 00:42:11.540 little bit depending on the type of people that you're working with 510 00:42:11.540 --> 00:42:14.150 tom okay. 511 00:42:14.150 --> 00:42:19.380 so firstly what's driving the need for machine physician 512 00:42:19.380 --> 00:42:24.650 literacy wire we even talking about this here to day dam and of course one reason 513 00:42:24.650 --> 00:42:29.630 i've already alluded to briefly is that machine translation is now 514 00:42:29.630 --> 00:42:34.310 out there right for many years starting from the time during 515 00:42:34.310 --> 00:42:39.310 the post second world war up to around the year two thousand or even two thousand 516 00:42:39.310 --> 00:42:44.440 five miching translation wasn't freely available it was an easy to get your 517 00:42:44.440 --> 00:42:49.360 hands on it was only researches and developers and may be professional translators that 518 00:42:49.360 --> 00:42:54.470 had access to the psychology but in the last fifteen years that has been turned upside 519 00:42:54.470 --> 00:42:59.600 down completely by tools like google translate now anyone can get their hands 520 00:42:59.600 --> 00:43:04.430 on machine translation all you need is internet connection and you've got machine translation of 521 00:43:04.430 --> 00:43:06.020 hustle to you. 522 00:43:06.020 --> 00:43:09.130 so that's one big changed. 523 00:43:09.130 --> 00:43:15.040 the second choice the everything is hopefully numbered one here or about the 524 00:43:15.040 --> 00:43:20.060 this second um item is that again as mention machine translations 525 00:43:20.060 --> 00:43:25.180 very easy to use and in some ways it almost too easy because when something is 526 00:43:25.180 --> 00:43:30.750 super easy to do we don't think about it which is do it and like on autopilot 527 00:43:30.750 --> 00:43:35.810 so the fact that it's so simple to use up puts us in risky 528 00:43:35.810 --> 00:43:41.140 situation where we don't necessarily take the time to think about whether we should be using it 529 00:43:41.140 --> 00:43:46.200 ah we've also talked about the data driven approached machine translation and one of 530 00:43:46.200 --> 00:43:51.280 the big differences between the output of dated driven approaches 531 00:43:51.280 --> 00:43:56.210 to machine translation and those old rule based approaches to machine translation is 532 00:43:56.210 --> 00:44:01.120 that the output of machine translation using dated 533 00:44:01.120 --> 00:44:06.280 driven approach sounds good the older approaches produced really 534 00:44:06.280 --> 00:44:11.790 clung key sentences and it was actually quite easy to tell 535 00:44:11.790 --> 00:44:16.410 that it was machine translation because it just sounded weird nay it sounded clung 536 00:44:16.410 --> 00:44:21.630 key and we might even collot sort of translation needs you know when you're reading about translation 537 00:44:21.630 --> 00:44:26.300 it just sounds off the words might be in the language 538 00:44:26.300 --> 00:44:31.090 but there and they don't go together well unjust sounds weird 539 00:44:31.090 --> 00:44:36.660 with driven machine translation it sounds good even if it's not write 540 00:44:36.660 --> 00:44:41.690 it sounds good so at last us into this fall sense of security it sounds 541 00:44:41.690 --> 00:44:47.270 reasonable so it must be right and sometimes it's not correct so it's harder 542 00:44:47.270 --> 00:44:52.430 to spot the errors and the errors are different from what we were used to seeing 543 00:44:52.430 --> 00:44:57.360 with the older approach astonishing translation so that the challenge 544 00:44:57.360 --> 00:45:02.660 and something else says that the popular media really seems to like talking about machine 545 00:45:02.660 --> 00:45:08.040 translation now it's thoughtfully up fairly regular topic in the popular media 546 00:45:08.040 --> 00:45:13.410 but the problem is that the popular media tend to present sort of binary 547 00:45:13.410 --> 00:45:17.980 um description of machine translation either 548 00:45:17.980 --> 00:45:22.770 they are picking on the mistakes and sort of saying that his 549 00:45:22.770 --> 00:45:27.710 terrible it's horrible tool or there really high ping and up at the 550 00:45:27.710 --> 00:45:32.740 other and and sing like waugh pretty soon we were even need translators like these are amazing their 551 00:45:32.740 --> 00:45:38.120 practically magic and those two extremes are 552 00:45:38.120 --> 00:45:42.770 not the reality for most machine collision which is really full summer in the 553 00:45:42.770 --> 00:45:47.770 middle so we have to be little bit careful about what we read in popular media because 554 00:45:47.770 --> 00:45:53.080 not always is of the case that the journalist understands the technology so they're not presenting 555 00:45:53.080 --> 00:45:58.130 it necessarily in the best uppermost new wonst light 556 00:45:58.130 --> 00:45:59.900 bouquet. 557 00:45:59.900 --> 00:46:05.190 so if all of those things were happening how we don't have ha background 558 00:46:05.190 --> 00:46:10.710 in translation the tools are really easy to use the medius telling us that that their amazing 559 00:46:10.710 --> 00:46:15.040 up or their terrible amina what where were does this leaf 560 00:46:15.040 --> 00:46:19.980 people wider we think that people should instinctively know how to use 561 00:46:19.980 --> 00:46:25.150 machine translation in smart way ah and the answer is that that they don't 562 00:46:25.150 --> 00:46:30.330 with some one who has no background in translation know how to use transition thrill 563 00:46:30.330 --> 00:46:35.160 in smart way it's like have no background in medicine don't let me loose 564 00:46:35.160 --> 00:46:40.070 in an operating theatre and let me start like no operating on people that would 565 00:46:40.070 --> 00:46:45.180 be terrible idea so why do we think that people who have no background in translation should 566 00:46:45.180 --> 00:46:50.020 just know how to use machine translation in smart way they don't and so 567 00:46:50.020 --> 00:46:54.220 feel that we in the language professions of the language industry have little bit 568 00:46:54.220 --> 00:46:58.680 of social responsibility to help them ah i'm develop good 569 00:46:58.680 --> 00:47:02.520 critical thinking skills around machine translation 570 00:47:02.520 --> 00:47:08.420 so i've started programme of machine transition literacy at my own university 571 00:47:08.420 --> 00:47:12.570 and in hope it basically looks like the first half of the presentation 572 00:47:12.570 --> 00:47:17.630 to day where we talk little bit about how machine transition works what are the 573 00:47:17.630 --> 00:47:22.750 the in kind of implications of the data driven approach to machine translation on the fact that 574 00:47:22.750 --> 00:47:27.730 are our multiple machine physician systems out there that we can try we talk 575 00:47:27.730 --> 00:47:32.710 little bit about this risk assessment idea which tasks are high stakes which 576 00:47:32.710 --> 00:47:37.740 are low stakes and how can we developed good judgment about when to use machine 577 00:47:37.740 --> 00:47:43.280 translation we talk about transparency that important to 578 00:47:43.280 --> 00:47:47.640 be honest if if you're using machine has ation tool you don't need to be ashamed 579 00:47:47.640 --> 00:47:52.650 of it's not shameful to use machine translation but eugenie to be transparent about it because that 580 00:47:52.650 --> 00:47:58.040 allows the recipient of the translated information to determine their own 581 00:47:58.040 --> 00:48:03.100 kind of comfort level in trusting that information came and we tottleben 582 00:48:03.100 --> 00:48:08.060 about interacting with machine translation the this idea of garbage in equals 583 00:48:08.060 --> 00:48:13.070 garbage out right that if you want good translation you need to provide good source 584 00:48:13.070 --> 00:48:18.100 text om and that at the other end to know the though translation that's 585 00:48:18.100 --> 00:48:23.030 produced by the machine might not be perfect and so depending on what 586 00:48:23.030 --> 00:48:28.470 you're planning to do with that information you might need to improve the text he might mean to do some editing 587 00:48:28.470 --> 00:48:33.160 were arranged for someone else to do the editing if you can to yourself but it may 588 00:48:33.160 --> 00:48:38.360 not be enough to does stopped with the raw machine translation out so that the lykon 589 00:48:38.360 --> 00:48:43.180 of type of thing that talked about in machine transition literacy as 590 00:48:43.180 --> 00:48:47.380 said it's custom icicle concept you might find that the groups that you're working with 591 00:48:47.380 --> 00:48:51.550 would benefit from other information in and thoughts of terrific you 592 00:48:51.550 --> 00:48:55.720 might be well placed to determine what that should be 593 00:48:55.720 --> 00:49:01.320 ha hum so what i've actually started awfu 594 00:49:01.320 --> 00:49:06.450 is to pilot of course on translation for non translators 595 00:49:06.450 --> 00:49:10.820 and machine transition literacy is just one part of that so 596 00:49:10.820 --> 00:49:15.810 it's not shah twelve we course on machine translation literacy it's 597 00:49:15.810 --> 00:49:20.810 twelve what course on translation and machine translation literacy as one model within 598 00:49:20.810 --> 00:49:26.110 that so is the only ma joined in talk about today but didn't want to giving the impression that the whole twelve 599 00:49:26.110 --> 00:49:31.040 weeks is about wishing translation hum but it's course that is 600 00:49:31.040 --> 00:49:36.220 the ah knew and our university anyway and its of course that is basically 601 00:49:36.220 --> 00:49:40.980 aimed at every one except people in the professional translation programme 602 00:49:40.980 --> 00:49:45.880 so it's first your course an it's very am open it has no pre requisite 603 00:49:45.880 --> 00:49:51.100 it's and students from any other faculty can register to take course so 604 00:49:51.100 --> 00:49:56.380 we've got students who were coming from the education faculty from engineering hell sciences 605 00:49:56.380 --> 00:50:00.980 management science social science all in all so far this 606 00:50:00.980 --> 00:50:05.930 is of the force iteration of the course so ice first taught it of him in 607 00:50:05.930 --> 00:50:10.910 the false master last year sewing in the fourth iteration of the coarse now 608 00:50:10.910 --> 00:50:16.390 and seen students from forty seven different programs of the university come through so it's 609 00:50:16.390 --> 00:50:20.940 really diverse the group whose there and also asked them 610 00:50:20.940 --> 00:50:26.030 to sort of self declare how om we have in canada of course to official languages 611 00:50:26.030 --> 00:50:31.140 french and english but we get many international students we have many students who 612 00:50:31.140 --> 00:50:35.880 have heritage language maybe they or their parents had emigrated from another country 613 00:50:35.880 --> 00:50:41.380 the soul we see diverse range of languages i've asked just 614 00:50:41.380 --> 00:50:45.980 enough students to self declare what their arm native or dominant languages 615 00:50:45.980 --> 00:50:51.540 and their friend twenty eight different languages in the course of heart 616 00:50:51.540 --> 00:50:56.080 and soul we have seen approximately two hundred 617 00:50:56.080 --> 00:51:01.130 students have gone through the course so far and asked them to do little survey on 618 00:51:01.130 --> 00:51:06.730 the machine translation literacy part of the coarse and i'faith agata hundred and seventy responses 619 00:51:06.730 --> 00:51:11.420 soul around eighty five percent response rate soit swinish share with you little bit 620 00:51:11.420 --> 00:51:16.160 ah what the results of of thought us survey were 621 00:51:16.160 --> 00:51:21.170 and what is the first questions but ask the students is how often do 622 00:51:21.170 --> 00:51:26.650 use machine translation is it like something that's regular for you or hardly ever 623 00:51:26.650 --> 00:51:31.050 and you can see that on you know up all over eighty per 624 00:51:31.050 --> 00:51:36.100 cent of the students are using it at least once month and many of them are using it much 625 00:51:36.100 --> 00:51:40.890 more often than that sometimes even every day so they are definitely using 626 00:51:40.890 --> 00:51:45.870 the technology it's not an alien concept to them it's not os something that 627 00:51:45.870 --> 00:51:50.620 they haena might use once year there using it quite regularly 628 00:51:50.620 --> 00:51:55.040 and there not translators made the students are not translation students 629 00:51:55.040 --> 00:51:59.440 of their of people who are doing in studying other things 630 00:51:59.440 --> 00:52:05.020 and asked them fino in what area of your life 631 00:52:05.020 --> 00:52:09.520 do use machine translation so of course i'm talking to students it's not terribly 632 00:52:09.520 --> 00:52:14.480 surprising that many of them are using it for their coarse work because your that's kind of their what 633 00:52:14.480 --> 00:52:19.400 they'd spent most of their time doing or studying and and so there it might be using machine 634 00:52:19.400 --> 00:52:24.790 translation for their courses but they also use it some of them might have part time job outside 635 00:52:24.790 --> 00:52:29.320 of their studies and the using it for leisure activities as well 636 00:52:29.320 --> 00:52:34.220 as butts of their life there using wishing consolation 637 00:52:34.220 --> 00:52:39.500 was also interested to know whether 638 00:52:39.500 --> 00:52:45.190 they use machine translation mostly to kind of absorb information 639 00:52:45.190 --> 00:52:49.450 so there starting with text in language that they don't know and translating 640 00:52:49.450 --> 00:52:54.320 it into their own language or whether they're trying to use machine translation to help them 641 00:52:54.320 --> 00:52:59.440 produce information to share with other people and it seems that 642 00:52:59.440 --> 00:53:04.320 most students use it in both directions so they are using machine translation both 643 00:53:04.320 --> 00:53:09.350 to kind of assimilate ah information in it from another language but also to 644 00:53:09.350 --> 00:53:14.490 disseminate information so these are actually two very different tasks and 645 00:53:14.490 --> 00:53:19.360 machine translation may be better or worse first for some era 646 00:53:19.360 --> 00:53:24.250 of those purposes so it's interesting to know that students are using it mainly in both 647 00:53:24.250 --> 00:53:26.100 directions. 648 00:53:26.100 --> 00:53:31.140 also wanted to get field for before 649 00:53:31.140 --> 00:53:35.740 they learn anything about in machine physician literacy how happy are they with machine 650 00:53:35.740 --> 00:53:40.330 translation to they think that its working well to meet the needs that they have 651 00:53:40.330 --> 00:53:45.200 and most students are reasonably satisfied him very few are completely 652 00:53:45.200 --> 00:53:49.570 satisfied very few are completely dissatisfied most are coming down 653 00:53:49.570 --> 00:53:54.260 somewhere in the middle either very or moderately satisfied 654 00:53:54.260 --> 00:53:59.420 so that gives me hope thought 655 00:53:59.420 --> 00:54:04.220 tom you know there if they could learn something for machine transition illiteracy and perhaps 656 00:54:04.220 --> 00:54:09.130 opfer subsection level little bit by learning how to use it sometimes it's not that the 657 00:54:09.130 --> 00:54:13.850 the tuneless bowed but that we've used it for the wrong thing or that we having shown 658 00:54:13.850 --> 00:54:18.760 good judgment around it so i'm hopeful that this hour result means that 659 00:54:18.760 --> 00:54:23.700 they're not so disappointed in it that they're goin to give up on it but there's room for improvement 660 00:54:23.700 --> 00:54:28.160 to be able to use it the it better more effectively 661 00:54:28.160 --> 00:54:33.320 ah big question how 662 00:54:33.320 --> 00:54:38.340 asked them if there was no free on languishing translation system what would 663 00:54:38.340 --> 00:54:44.060 you do instead and the really interesting thing for me here is that nobody 664 00:54:44.060 --> 00:54:48.530 would pay professional now we are talking about students students 665 00:54:48.530 --> 00:54:53.970 don't have lot of money on but the types of needs that students hof 666 00:54:53.970 --> 00:54:58.520 they would not pay professional and this is really important because 667 00:54:58.520 --> 00:55:03.840 in the transition community high think there's often quite lot of concern that machine 668 00:55:03.840 --> 00:55:09.070 translation is taking away business from professional translators her this group 669 00:55:09.070 --> 00:55:13.810 would not be paying professional translators anyway so there's really no need for 670 00:55:13.810 --> 00:55:18.700 us or to false competition between professional translators and 671 00:55:18.700 --> 00:55:24.120 inch of machine translation systems at least not in the context of students 672 00:55:24.120 --> 00:55:29.700 most of them said all i'd ask friends to help or colleague om couple would translate 673 00:55:29.700 --> 00:55:34.900 effects themselves ah i'm quite if you would just do nothing so it's 674 00:55:34.900 --> 00:55:40.230 it's really not ah this use case anyway is not threat to professional 675 00:55:40.230 --> 00:55:46.300 translation that this is type of translation where the needs were simply going on met before 676 00:55:46.300 --> 00:55:50.950 on its it's not business that's being taken away from translators 677 00:55:50.950 --> 00:55:55.980 when asked them how out of all the different things 678 00:55:55.980 --> 00:56:01.230 that we learned in the machine transition literacy ma jewel which one surprised 679 00:56:01.230 --> 00:56:06.310 you the most or what was the most novel thing that he learns that you hadn't expected about wishing 680 00:56:06.310 --> 00:56:11.410 her violation and the most common one was plum they were shocked to find 681 00:56:11.410 --> 00:56:16.080 out said that that google or the other transition companies could keep their data 682 00:56:16.080 --> 00:56:21.260 and an lot of soon saw unfit to be little more careful now about what enter into 683 00:56:21.260 --> 00:56:26.200 machine hesitation system it never occurred to me i'll or lot of students said it didn't occur 684 00:56:26.200 --> 00:56:31.240 to me but now that you pointed out or think yacht of course there keeping my data what was what 685 00:56:31.240 --> 00:56:36.760 was to be thinking but come yet that was the most common one but quite variety 686 00:56:36.760 --> 00:56:41.200 as you see so that again made me think that machine translation literacy 687 00:56:41.200 --> 00:56:47.100 is worth it because they are learning different things some people knew acts some people knew why but 688 00:56:47.100 --> 00:56:52.950 nobody knew everything and almost every one learns something new in machine translation literacy mont 689 00:56:52.950 --> 00:56:55.040 to me. 690 00:56:55.040 --> 00:56:59.850 and ha finally 691 00:56:59.850 --> 00:57:04.730 asked them to questions at the and said you are not 692 00:57:04.730 --> 00:57:10.110 translator you're not planning to become translator but you do use machine translation 693 00:57:10.110 --> 00:57:14.530 do you think that this montell unashamed translation literacy 694 00:57:14.530 --> 00:57:19.210 or the concept of machine transition literacy in general is important for people who 695 00:57:19.210 --> 00:57:24.090 are not planning to become language professionals and most 696 00:57:24.090 --> 00:57:28.870 of the students said yes it was very important they really felt that they had learnt something 697 00:57:28.870 --> 00:57:33.520 and that it would change their approach to using machine translation in the future 698 00:57:33.520 --> 00:57:38.570 hardly anyone set it wasn't worth while whom no not everyone thought it was super important 699 00:57:38.570 --> 00:57:43.820 but most people thought it was no it was worth learning they had de felt it was time whilst 700 00:57:43.820 --> 00:57:48.870 that and the very last question said was do you think then 701 00:57:48.870 --> 00:57:53.680 based on your experience that this because it's pilot project at the moment do 702 00:57:53.680 --> 00:57:59.070 you think that we should make this of permanent part of the universities offering 703 00:57:59.070 --> 00:58:03.660 and that we should be teaching machine translation literacy to all students even 704 00:58:03.660 --> 00:58:09.150 the ones who aren't planning look of career in the language professions and again was really 705 00:58:09.150 --> 00:58:13.760 arm pleasantly surprised by the kind of positive um 706 00:58:13.760 --> 00:58:18.960 encouragement for doing us for saying yes it does seem to be worth while people who art 707 00:58:18.960 --> 00:58:23.700 in the length refreshments are still users of this technology and are perhaps more in 708 00:58:23.700 --> 00:58:27.980 need of it because they won't get it this type of education anywhere else. 709 00:58:27.980 --> 00:58:32.380 so that was the last question in the survey which 710 00:58:32.380 --> 00:58:36.130 so just to wrap things up has sort of take away 711 00:58:36.130 --> 00:58:41.410 ah if could say anything about machine translation would 712 00:58:41.410 --> 00:58:46.830 say it is not going to it is here it's free it's easy 713 00:58:46.830 --> 00:58:52.020 to use of course people are going to use it they are using it as said 714 00:58:52.020 --> 00:58:57.130 the i'm reason estimate from google themselves was that there 715 00:58:57.130 --> 00:59:02.670 are billion users of this technology so even if we think no 716 00:59:02.670 --> 00:59:08.000 and by we i'm in people in the language professions think that people outside length professions 717 00:59:08.000 --> 00:59:13.260 shouldn't useless technology that they should turn to professionals for high quality service 718 00:59:13.260 --> 00:59:18.130 instead it doesn't matter what we think they are using it 719 00:59:18.130 --> 00:59:23.370 and there using it often enter using it for of variety of purposes 720 00:59:23.370 --> 00:59:28.320 so think that we should be left concerned about this idea 721 00:59:28.320 --> 00:59:33.330 of competing hum with machine translation ah many of the 722 00:59:33.330 --> 00:59:38.480 things that people use it for they would not have hired professional translator for anyway said 723 00:59:38.480 --> 00:59:43.820 not taking away any business and to this pop point 724 00:59:43.820 --> 00:59:48.800 is think also really important things that are obvious to translators are not 725 00:59:48.800 --> 00:59:53.610 obvious to non translators and echoes backed to to my point sounded like 726 00:59:53.610 --> 00:59:58.910 why would we think that people who don't have background of preparation training 727 00:59:58.910 --> 01:00:03.700 in translation would be able to use the stool in the smartest way possible 728 01:00:03.700 --> 01:00:08.680 ah unless we helped them so do think that there is sort of social responsibility on 729 01:00:08.680 --> 01:00:13.870 on the part of the language professions because think we has been guilty of little bit 730 01:00:13.870 --> 01:00:18.200 of slamming the technology saying that it's really 731 01:00:18.200 --> 01:00:23.610 not goulden you should be known your you really mean to be careful 732 01:00:23.610 --> 01:00:27.700 you should be hiring professional translator instead and think that we need 733 01:00:27.700 --> 01:00:32.460 to change or tone little bit and recognized that caught in week 734 01:00:32.460 --> 01:00:37.220 can play roland helping people and that it's not harming us helping others as 735 01:00:37.220 --> 01:00:41.930 not harming us and so am think that that made me too 736 01:00:41.930 --> 01:00:47.210 think about that little more and i'm actually interested in where and when we should be doing 737 01:00:47.210 --> 01:00:52.200 machine translation literacy interventions on and i've been working 738 01:00:52.200 --> 01:00:57.230 lot with first or university students but it seems to me that tom 739 01:00:57.230 --> 01:01:02.580 it might be worth introducing this to people who are even younger because their arriving at university 740 01:01:02.580 --> 01:01:07.330 already having used machine translation so maybe we should be thinking 741 01:01:07.330 --> 01:01:12.560 about integrating it into high schools or maybe even middle school don't know 742 01:01:12.560 --> 01:01:17.190 am to do that we need to reach an important group and the where i'm 743 01:01:17.190 --> 01:01:22.750 sort of trying to tackle now is the fly dia of odd teaching the teacher's 744 01:01:22.750 --> 01:01:27.460 so we can't necessarily put translator in every class room but if we could get 745 01:01:27.460 --> 01:01:32.130 out involved in may be integrating machine translation literacy education in 746 01:01:32.130 --> 01:01:37.210 to teacher training than the teacher's themselves could carry that forward to the students 747 01:01:37.210 --> 01:01:42.060 and think that am it could be part of sort of media literacy 748 01:01:42.060 --> 01:01:46.230 did illiteracy information literacy kind of package and we know that 749 01:01:46.230 --> 01:01:51.260 they have those at schools are ready with sort of me to inject little bit of machine translation 750 01:01:51.260 --> 01:01:55.710 literacy into that overall package to to reach on much 751 01:01:55.710 --> 01:02:00.120 wider think group of the gentle public 752 01:02:00.120 --> 01:02:04.670 and will end their high will put up just when last were people can 753 01:02:04.670 --> 01:02:09.560 contact me you're interested in knowing more about the machine translation literacy 754 01:02:09.560 --> 01:02:13.870 project we have web sight and were on twitter and you can 755 01:02:13.870 --> 01:02:18.250 of course female me at my home institution 756 01:02:18.250 --> 01:02:20.930 and there's the book of course the . 757 01:02:20.930 --> 01:02:24.180 faith faith. 758 01:02:24.180 --> 01:02:29.130 oh yes have to say am so i'm 759 01:02:29.130 --> 01:02:34.280 attached to school of translation and interpretation and very much in translator 760 01:02:34.280 --> 01:02:38.780 training so don't princely do language teaching but i'm interested in having more 761 01:02:38.780 --> 01:02:43.850 conversations with people who do language teaching because agree with you think you're parallels 762 01:02:43.850 --> 01:02:48.700 and think that we could the um learned from each other and and perhaps 763 01:02:48.700 --> 01:02:53.290 reach wider audience by working together yah 764 01:02:53.290 --> 01:02:55.130 by. 765 01:02:55.130 --> 01:03:00.880 now really hope that do think it's something that many groups could participate 766 01:03:00.880 --> 01:03:06.060 in hand really loved the language campus sat up that you 767 01:03:06.060 --> 01:03:11.820 have here where you bring people together from the languished apartment but also from education or an also 768 01:03:11.820 --> 01:03:16.910 people with interest and technologies an do think that this is kind of 769 01:03:16.910 --> 01:03:22.130 into disciplinary issue and that we will 770 01:03:22.130 --> 01:03:27.140 in oh do better job of keineth's solving it or tackling it by working 771 01:03:27.140 --> 01:03:32.240 together out because think translation is one facet of it but 772 01:03:32.240 --> 01:03:36.820 its also about teaching its also about which her nerves yacht 773 01:03:36.820 --> 01:03:41.890 bum new critical development judgment which can be 774 01:03:41.890 --> 01:03:46.450 honed through other avenues then only through translation 775 01:03:46.450 --> 01:03:51.580 yet do actually strongly 776 01:03:51.580 --> 01:03:57.310 believin that think there's been sort of false idea of competition between 777 01:03:57.310 --> 01:04:01.750 profession translators and machine translation partly mean partly 778 01:04:01.750 --> 01:04:06.800 it developed way back in that period in the following second were were 779 01:04:06.800 --> 01:04:11.850 part of the problem was that at the time turned now there was nothing 780 01:04:11.850 --> 01:04:16.940 like constitutional linguist expire was nothing like language technology at the time 781 01:04:16.940 --> 01:04:21.690 all of the workin machine translation was being done only by those who really computer 782 01:04:21.690 --> 01:04:26.860 scientists to her often mathematicians or electrical engineers and 783 01:04:26.860 --> 01:04:31.770 they thought that miching translation would be an easy thing for them to to 784 01:04:31.770 --> 01:04:36.660 salt they use niecks to solving translation so at the time um 785 01:04:36.660 --> 01:04:41.760 the people working on machine translation were perhaps little bit naive about 786 01:04:41.760 --> 01:04:46.980 what was involved because they were translators themselves they didn't knoll and sold 787 01:04:46.980 --> 01:04:52.190 the message ing that went out in the early years was quite destructive think they really 788 01:04:52.190 --> 01:04:57.640 over promised and then under delivered and soul it left bad 789 01:04:57.640 --> 01:05:01.980 taste really in the kind of hummed mouth of translator to have 790 01:05:01.980 --> 01:05:07.300 been told you know of this technologies can put you out of business that was the initial 791 01:05:07.300 --> 01:05:12.020 massah jack came out and of course we in transition professions realize its much more 792 01:05:12.020 --> 01:05:17.150 complex than code breaking its not something that's an easy fix ham 793 01:05:17.150 --> 01:05:22.080 bite it set bad tore right from the beginning which was unfortunate but think we have enough 794 01:05:22.080 --> 01:05:27.050 distance enough history noun enough opportunities for collaboration that we do need 795 01:05:27.050 --> 01:05:32.240 to change that from such negative tone to seraphina we have valuable 796 01:05:32.240 --> 01:05:36.400 asgill to offer we can help you learnin and why not have you learn 797 01:05:36.400 --> 01:05:40.800 its not it's not harming us so now 798 01:05:40.800 --> 01:05:45.590 ardors ya i'm sure don't personally know 799 01:05:45.590 --> 01:05:50.910 how to do it i'm but i'm quite certain that there are tools out there in their called web scrapers 800 01:05:50.910 --> 01:05:55.040 so c'est 801 01:05:55.040 --> 01:06:00.220 scrape scrape or an to um think if you look into literature 802 01:06:00.220 --> 01:06:05.200 on the digital humanity's think you'll find information about when scraping in the digital 803 01:06:05.200 --> 01:06:10.110 humanity's yacht all there is this lot of especially government 804 01:06:10.110 --> 01:06:15.350 things as lot of free open om like it's not copyrighted material 805 01:06:15.350 --> 01:06:20.250 when its government shall wed sights and things like that it stood open domain and sewed so you 806 01:06:20.250 --> 01:06:25.150 don't need permission you don't need license by sort of thing what 807 01:06:25.150 --> 01:06:30.370 if you'd sit in the freed of open domain you can use and lot of mean governments 808 01:06:30.370 --> 01:06:35.590 of countries that our officially bi lingual will have lot of of that information 809 01:06:35.590 --> 01:06:40.220 on that's very particular text type like your only getting government stuff but it's enough to 810 01:06:40.220 --> 01:06:45.200 play with and too kind of start ya bon 811 01:06:45.200 --> 01:06:50.210 so in the alms scholarly community this is actually becoming 812 01:06:50.210 --> 01:06:55.410 of very active area of interest on what where seeing is thatched 813 01:06:55.410 --> 01:07:00.560 there's push back now up for many years english husband the lingua franca of scholarly 814 01:07:00.560 --> 01:07:05.250 publishing and you know it makes sense on the one hand to 815 01:07:05.250 --> 01:07:10.570 use lingua franca because there are many benefits but on the other hand there also many drawbacks and 816 01:07:10.570 --> 01:07:15.220 the drawbacks are particularly for the ninety five per cent of research was who are not 817 01:07:15.220 --> 01:07:20.280 native speakers of english soul omnino it's it's quite unbalanced 818 01:07:20.280 --> 01:07:25.370 i'm just earlier this month ah you nest go issued recommendation on 819 01:07:25.370 --> 01:07:30.230 open science and one of the things that they recommended was that science should 820 01:07:30.230 --> 01:07:35.230 be more multi lingual that foreseeing all of science through single 821 01:07:35.230 --> 01:07:40.470 common language leads in some cases to poor science and can have very 822 01:07:40.470 --> 01:07:45.560 detrimental affects um and so they are advocating for multi 823 01:07:45.560 --> 01:07:50.450 lingual science which in no it in the problem is that multi 824 01:07:50.450 --> 01:07:55.360 lingual and lingua franca they each have strengths and weaknesses the strength of one is the weakness 825 01:07:55.360 --> 01:08:00.330 of the other so there's no perfect solution omelet but since we have 826 01:08:00.330 --> 01:08:05.700 seen already thee down side of using single language common 827 01:08:05.700 --> 01:08:10.710 language for science on think people are interested in exploring whether 828 01:08:10.710 --> 01:08:15.920 the other option our will be better will solve up some of the problems and 829 01:08:15.920 --> 01:08:20.580 think machine translation is poised to become an important part of that 830 01:08:20.580 --> 01:08:25.380 but it can't be it ever seen as the panacea it's not the solution 831 01:08:25.380 --> 01:08:30.440 and it's not going to fix everything but think can play an important roll had but think 832 01:08:30.440 --> 01:08:36.030 this literacy is equally important because they were going to again be seeing 833 01:08:36.030 --> 01:08:40.510 scholars who are scholars of other disciplines not translation but scholars 834 01:08:40.510 --> 01:08:45.870 of everything else who again why with they know how to use machine translation 835 01:08:45.870 --> 01:08:50.490 sensibly ha some of the challenges that wagon face are ob de l'eau 836 01:08:50.490 --> 01:08:55.890 domain below resource domain because the more specialized our work the fewer articles 837 01:08:55.890 --> 01:09:00.820 that have been written about it omne the purpose of researches 838 01:09:00.820 --> 01:09:05.710 often innovation meaning nothing has been written about it raises brand new thing 839 01:09:05.710 --> 01:09:10.690 omnes so those or some the challenges the organ half too think take on our if 840 01:09:10.690 --> 01:09:15.540 we want to really see machine translation be successful in unscholarly communication howdah 841 01:09:15.540 --> 01:09:20.620 rebuild the corpora howdah we amass enough data in all the languages that we need and 842 01:09:20.620 --> 01:09:26.260 all the subject feels that we need and think it's going to require cooperation because 843 01:09:26.260 --> 01:09:30.930 the other tonto scully communication right now is that most of the writing 844 01:09:30.930 --> 01:09:35.820 is behind pale wall rate almost all of the journals require subscription 845 01:09:35.820 --> 01:09:41.140 access the only thing it's available as the abstract which is in all something 846 01:09:41.140 --> 01:09:45.770 but it's not enough it's hot so we're going to need collaboration from 847 01:09:45.770 --> 01:09:51.470 the publishing industry to make the data available for corpora 848 01:09:51.470 --> 01:09:56.150 at in order to the have something to feed the machine translation system for it to learn from 849 01:09:56.150 --> 01:10:00.830 so it's it's definitely going to be collaborate of effort and in 850 01:10:00.830 --> 01:10:05.700 all open access is taken off open educational resources open pub 851 01:10:05.700 --> 01:10:11.240 rushing out so that might help lot if we could ha have more people publishing 852 01:10:11.240 --> 01:10:16.060 in open access that they'd have becomes available for us system 853 01:10:16.060 --> 01:10:21.070 such as machine translation on and this was just really really interested by this 854 01:10:21.070 --> 01:10:25.940 hour recommendation from unit score justice months that in we need to 855 01:10:25.940 --> 01:10:31.600 they framed it in way of hum inclusion rate to make science 856 01:10:31.600 --> 01:10:36.650 more inclusive we need to ah have more multi lingual 857 01:10:36.650 --> 01:10:41.110 science that we can't expect everyone to on top of learning their 858 01:10:41.110 --> 01:10:46.260 discipline also to become not just so so english speakers but like really 859 01:10:46.260 --> 01:10:51.140 high level english speakers who can be all right academic articles an it's 860 01:10:51.140 --> 01:10:56.130 does not reasonable so to make signs more inclusive it basically 861 01:10:56.130 --> 01:11:01.450 turns lot of people off people who might be really good scientists but you don't want to be conference 862 01:11:01.450 --> 01:11:06.290 presenting an english ah you know having to read everything all the time in english having to pay 863 01:11:06.290 --> 01:11:11.330 for their work to be translated in english having to have everything take three times as 864 01:11:11.330 --> 01:11:16.480 long because the doing at another lank tried like it's not and inclusive model and so 865 01:11:16.480 --> 01:11:21.410 if we can get away from english or any of them in its happens 866 01:11:21.410 --> 01:11:26.470 to be english but lingua franca model for science and towards multi mingle model 867 01:11:26.470 --> 01:11:31.700 and when say finance mean like research don't mean i'm pure 868 01:11:31.700 --> 01:11:36.470 and natural sciences mean all types of research 869 01:11:36.470 --> 01:11:40.910 does not answer question felucca went off in kind of ranch there bostock frighted 870 01:11:40.910 --> 01:11:44.700 by the eunice go on declaration now 871 01:11:44.700 --> 01:11:49.700 don't has time to wrap up no doubt or hope 872 01:11:49.700 --> 01:11:54.850 and kate thank you so much again for the invitation to speak and for the great stout participation 873 01:11:54.850 --> 01:11:58.230 in interaction added was lovely thank you ma'am