Posts

A Brief Survey of Machine Reading Comprehension as applied to Question Answering

Machine Reading Comprehension (MRC) is a Natural Language Understanding problem that aims to automatically understand the meaning of human-generated text. The goal of “understanding” a piece of text is too arbitrary to define directly as a computational problem. Therefore, MRC is generally modelled as a set of tasks where the objective is to create computational models that can answer input queries when given some reference textual passage. A model’s language understanding ability can be demonstrated by its ability to answer various questions about the passage, which require an understanding of its meaning and the ability to reason over facts within the text.
2022-11-24
22 min read

Scraping Websites by Asking Questions: An Intro to MarkupLM

Overview The internet is the largest public repository of information available to us. Most of the data on the internet exists in the form of web pages, which are structured documents that are rendered by a web browser to be read by humans. These pages are rendered in a way that makes the information easy to understand visually, but difficult for computers to parse. Websites about products and services often have a wealth of information in the form of key-value pairs, but these are generally difficult or time-consuming to parse into a structured format.
2023-06-10
14 min read

An Introduction to Unsupervised Topic Segmentation with Implementation in Python

Text Segmentation is a task in Natural Language Processing that aims to divide a text document into semantically or topically coherent sections. This is useful for creating topic-specific summaries, organising long documents into sections for ease of reading, reducing noise, improving information retrieval, and more. The goal of the task is to identify breakpoints between pairs of sentences where the topic deviates significantly. It is not necessary to know the number or identities of the topics themselves beforehand, and classifying the derived segments into meaningful categories can be done as a later step.
2023-03-21
15 min read