Within the second submit, we are going to give attention to this paper:
Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. « Semantically equivalent adversarial rules for debugging nlp models. » Proceedings of the 56th Annual Assembly of the Affiliation for Computational Linguistics (Quantity 1: Lengthy Papers). Vol. 1. 2018.
Robustness is a central concern in engineering. Our suspension bridges want to face towards robust wind so it will not collapse just like the Tacoma Narrows Bridge [video]. Our nuclear reactors must be fault tolerant in order that the Fukushima Daiichi incident will not occur sooner or later [link].
Once we change into more and more reliant on a kind of know-how — suspension bridges, nuclear energy, or on this case: NLP fashions, we should increase the extent of belief we now have on this know-how. Robustness is exactly the requirement we have to place on such techniques.
Early work from Jia & Liang (2017) reveals that NLP fashions are usually not proof against small negligible-by-human perturbation in textual content — a easy addition or deletion can break the mannequin and drive it to supply nonsensical solutions. Different work comparable to Belinkov & Bisk , Ebrahimi et al. confirmed a scientific perturbation that dropping or changing a personality is adequate to interrupt a mannequin. Introducing noise to sequence knowledge just isn’t all the time unhealthy: earlier work executed by Xie et al. reveals that coaching machine translation or language mannequin with phrase/character-level perturbation (noising) truly improves efficiency.
Nonetheless, it’s onerous to name these perturbed examples « adversarial examples » within the authentic conception of Ian Goodfellow. This paper proposed a technique to characterize an adversarial instance in textual content with two properties:
Semantic equivalence of two sentences: ( textual content{SemEq}(x, x’) )
Perturbed label prediction: ( f(x) not= f(x’) )
In our dialogue, individuals level out that from a linguistic perspective, it is vitally tough to outline « semantic equivalence » as a result of we do not have a exact and goal definition of « which means ». That is to say that regardless that two sentences may elicit the identical impact for a specific job, they don’t must be synonymous. A extra nuanced dialogue of paraphrases in English could be present in What Is a Paraphrase? [link] by Bhagat & Hovy (2012). On this paper, semantic equivalence is operationalized as what people (MTurkers) judged to be « equal ».
Semantically Equal Adversaries (SEAs)
Ribeiro et al. argue that solely a sequence that satisfies each circumstances is a real adversarial instance in textual content. They translate this standards right into a conjunctive kind utilizing an indicator operate:
textual content{SEA}(x, x’) = unicode{x1D7D9}[text{SemEq}(x, x’) wedge f(x) not= f(x’)] label{1}
$$
On this paper, semantic equivalence is measured by the probability of paraphrasing, outlined in multilingual multipivot paraphrasing paper from Lapata et al. (2017). Pivoting is a method in statistical machine translation proposed by Bannard and Callison-Burch (2005): if two English strings (e_1) and (e_2) could be translated into the identical French string (f), then they are often assumed to have the identical which means.
The pivot scheme is depicted by the generative mannequin on the left, which assumes conditional independence between (e_1) and (e_2) given (f): (p(e_2 vert e_1, f) = p(e_2 vert f)) . Multipivot is depicted by the mannequin on the proper: it interprets one English sentence into a number of French sentences, and translate again to generate the paraphrase. The back-translation of multipivoting could be a easy decoder common — every decoder takes a French string, and the general output chance for the subsequent English token is the weighted sum of the chance of each decoder.
Paraphrase Likelihood Reweighting
Assuming the unnormalized logit from the paraphrasing mannequin is (phi(x’ vert x)), and suppose (prod_x) is the set of paraphrases that the mannequin might generate given (x), then the chance of a specific paraphrase could be written as under:
p(x’|x) = fracx){sum_{i in prod_x} phi(i|x)}
$$
Notice within the denominator, all sentences being generated (together with producing the unique sentence) share the chance mass. If a sentence has many easy-to-generate paraphrases (indicated by excessive (phi) worth), then (p(x vert x)) will probably be small, in addition to all different (p(x’ vert x)). Dividing (p(x’ vert x)) by (p(x vert x)) will get a big worth (nearer to 1). As for a sentence that’s tough to paraphrase, (p(x vert x)) must be quite massive in comparison with (p(x’ vert x)), then this ratio will present a a lot smaller worth (nearer to 0).
Based mostly on this instinct, Ribeiro et al. proposed to compute a semantic rating (S(x, x’)) as a measure of the paraphrasing high quality:
S(x, x’) = min(1, fracx)x))
textual content{SemEq}(x, x’) = unicode{x1D7D9}[S(x, x’) geq tau]
$$
A easy schema to generate adversarial sentences that fulfill the Equation 1 is: ask the paraphrase mannequin to generate paraphrases of a sentence (x). Attempt these paraphrases if they will change the mannequin prediction: (f(x’) not = f(x)).
Semantically Equal Adversarial Guidelines (SEARs)
SEAs are adversarial examples generated independently for every instance. On this step, authors lay out steps to transform these native SEAs to world guidelines (SEARs). The rule outlined on this paper is an easy discrete transformation (r = (a rightarrow c)). The instance for (r = (film rightarrow movie)) could be (r)(« Nice film! ») = « Nice movie! ».
Given a pair of textual content ((x, x’)) the place (textual content{SEA}(x, x’) = 1), Ribeiro et al. choose the minimal contiguous span of textual content that flip (x) into (x’), embrace the quick context (one phrase earlier than and after the span), and annotate the sequence with POS (A part of Speech) tags. The final step is to generate the product of combos between uncooked phrases and their POS tags. A step-wise instance is the comply with:
« What shade is the tray? » -> « Which shade is the tray? »
Step 1: (What -> Which)
Step 2: (What shade -> Which shade)
Step 3: (What shade -> Which shade), (What NOUN -> Which NOUN), (WP shade -> Which shade), (What shade -> WP shade)
Since this course of is utilized for each pair of ((x, x’)), and we assume people are solely prepared to undergo (B) guidelines, Ribeiro et al. suggest to filter the candidates such that (vert R vert leq B). The standards could be:
- Excessive chance of manufacturing semantically equal sentences: that is measured by a inhabitants statistic (E_{x sim p(x)}[text{SemEq(x, r(x))}] geq 1 – delta). Merely put, by making use of this guidelines, most (x) within the corpus could be translated to semantically equal paraphrases. Within the paper, (delta = 0.1).
- Excessive adversary depend: rule (r) should additionally generate paraphrases that can alter the prediction of the mannequin. Moreover, the semantic similarity must be excessive between paraphrases. This may be measured by (sum_{x in X} S(x, r(x)) textual content{SEA}(x, r(x))).
- Non-redundancy: guidelines must be numerous and canopy as many (x) as attainable.
To fulfill standards 2 and three, Ribeiro et al. proposed a submodular optimization goal, which could be solved with a grasping algorithm with a theoretical assure to a continuing issue off of the optimum.
max_ <B sum_{x in X} max_{r in R} S(x, r(x)) textual content{SEA}(x, r(x))
$$
The general algorithm is described under:
Experiment and Validation
The important thing metric Ribeiro et al. measure is the share of Flips, outlined as within the validation set, what number of situations are predicted appropriately on the validation knowledge, however predicted incorrectly after the applying of the rule.
The touch upon this metric throughout dialogue is that it doesn’t point out what number of examples are affected by this rule. For instance, a rule that adjustments « shade » to « color » may solely have a Flips charge of two.2% in VQA dataset, however this could be as a consequence of the truth that within the validation set of VQA, solely 2.2% of situations comprise the phrase shade, so in reality, this rule has a 100% charge of success at producing adversarial examples.
The paper reveals some actually good discrete guidelines that may generate adversarial textual content examples:
Human-in-the-loop
Ribeiro et al. performed experiments on people. Bringing people into the loop can serve two functions: people can decide if guidelines can truly generate paraphrases (past the semantic scoring mannequin offered by Lapata et al.); people can resolve if the perturbations incurred by guidelines are literally significant.
They first decide the standard of SEA: For 100 appropriately predicted situations within the validation set, they create three units of comparability: 1). utterly created by human MTurkers, known as people; 2). purely generated by the paraphrasing mannequin described above as SEA; 3). Generate SEA by the algorithm, however exchange the (S(x, x’)) standards with human judgment of similarity.
They present that SEA narrowly beats human (18% vs. 16%), however combining with human judgments, HSEA outperforms human by a big margin (24% vs. 13%).
Then they consider the worldwide guidelines SEARs. This time, they invite « consultants » to make use of an interactive net interface to create world guidelines. They outline consultants as college students, schools who’ve taken one graduate-level NLP or ML class. Strictly talking, consultants ought to have been linguistic college students.
Specialists are allowed to see quick suggestions on their rule creation: they know what number of situations (out of 100) are perturbed by their rule, and what number of situations have their prediction label perturbed. To be able to have a good comparability, they’re requested to create as many guidelines as they need however choose 10 as the most effective. Additionally, every knowledgeable is given roughly quarter-hour to create guidelines. They have been additionally requested to guage SEARs and choose 10 guidelines that almost all protect semantic equivalence.
The outcomes are usually not shocking. SEARs are a lot better at reaching a excessive flip proportion. The mixed effort between human and machine is increased than the person. Additionally they in contrast the variety of seconds on common it takes an knowledgeable to create guidelines vs. evaluating guidelines created by the machine.
Lastly, the paper reveals a easy methodology to repair these bugs: they will merely perturb the coaching set utilizing these human-accepted guidelines, and they can scale back the share of error from 12.6% to 1.4% on VQA, and from 12.6% to three.4% on sentiment evaluation.
Wrap up
This paper makes use of paraphrasing fashions as a technique to measure semantic similarity and producing semantically equal sentences. As is talked about within the textual content, machine translation based mostly paraphrasing perturbs the sentence solely domestically, whereas people generate semantically equal adversaries with extra important perturbations.
One other limitation is that gradient-based adversarial instance era is extra guided, whereas the strategy proposed by this paper appears to be a easy trial-and-error method (maintain producing paraphrases till one paraphrase perturbs the mannequin prediction). On the flip aspect, this methodology applies to blackbox fashions with out entry to gradients, and thus extra common than gradient-based approaches.
This paper supplies a transparent framework and proposes clear properties that adversarial textual content examples ought to abide. This definition could be very suitable with adversarial examples in laptop imaginative and prescient. Nonetheless, this framework solely covers a selected sort of adversarial examples. An apparent adversarial instance not coated by this methodology could be operations comparable to including or deleting sentences, which is vital at attacking QA fashions.