عجفت الغور

natural language inference lecture

Tags: ling-ga 1012 (nlp and semantics)

intro

  • Motivating question: can neural network methods do anything that resembles compositional semantics?

    • What’s our metric? How do we know we’ve accomplished a goal?
  • also sometimes called recongizing textual entailment (rte) - same as nli

  • example: premise -> hypothesis, does the premise entail the hypothesis?

    • Ido Dagan 05

      We say that T entails H, if typically, a human reading T would infer that H is most likely true

    • NLI entailment is a lot more loose than semantic entailment

      • same looseness applies to contradiction
  • what is the meaning of a sentence?

    • this is unproductive, we can’t really know what “““meaning””” is
    • alternative question: what concrete phenomena do you have to deal with to understand a sentence?
      • focus on behaviors instead

      • for NLI to work, you need to understand a lot:

      • NLI is an ungrounded tasks - we do not require systems to look at situations outside of langauge

  • if you know the truth condition of two sentences, can you work out if one entails the other?

  • NLI asks us to reasonable about things even if we don’t know what it means

datasets

learning

Feature based models

  • logistic regression, bag of words features on hypthesis, bag of word-pairs features to capture alignment, tree kernels

natural logic

  • rules based
  • non ML work on NLI is here
  • formal logic for deriving entailments between a pair of sentences
  • operates directly on words
  • generally sound, entailment here means actual entailment
    • but not complete, cannot detect some entailments
    • requires clear structural parallels
    • most NLI datasets won’t work with this

theorem proving

  • attempts to translate sentences into logical forms
  • open-domain semantic parsing is still hard
  • more difficult than natural logic

deep learning

  • 2015-17 - attempted to built DL systems that understood natural logic
  • machinery has gotten very complex, and BERT style models have replaced it

applications

  • 3 major types
    • direct application

    • nli as a research and evaluation tasks

      • very used for benchmarking
      • glue
      • caveat
        • state of the art benchmark is very close to human performance
        • in other words, state of the art datasets are not high quality enough, so the datasets are “solved”
    • nli as a pretraining task in transfer learning

      • if you teach a model NLI, it should be reasonably good at other tasks
      • take a model, fine tune it on MNLI, and then fine tune it again
      • this works well even in conjunction with strong baselines for pretraining like RoBERTa

beyond nli