Science Communication on YouTube.

I do not TikTok but I do spend time on YouTube.

1. Veritasium: Derek Muller explores scientific mysteries and misconceptions.

Veritasium YouTube Channel

2. SmarterEveryDay: Destin Sandlin delves into the mechanics of the world through experiments and slow-motion footage.

SmarterEveryDay YouTube Channel

3. Numberphile: Brady Haran collaborates with mathematicians to discuss intriguing number-related topics.

Numberphile YouTube Channel

4. MinutePhysics: Henry Reich offers quick and simple explanations of physics concepts.

MinutePhysics YouTube Channel

5. SciShow: Hosted by Hank Green and others, SciShow presents daily videos on a wide range of scientific topics.

SciShow YouTube Channel

6. CrashCourse: Founded by John and Hank Green, CrashCourse provides educational videos on various subjects, including science.

CrashCourse YouTube Channel

7. Physics Girl: Dianna Cowern explores physics phenomena with engaging demonstrations and experiments.

Physics Girl YouTube Channel

8. AsapSCIENCE: Mitchell Moffit and Gregory Brown create weekly videos that answer scientific questions and explain concepts.

AsapSCIENCE YouTube Channel

9. It’s Okay To Be Smart: Hosted by Joe Hanson, this channel explores the science behind everyday phenomena.

It’s Okay To Be Smart YouTube Channel

10. Milo Rossi: Milo Rossi critically examines pseudoscientific claims and explores archaeological topics, aiming to debunk myths and present accurate historical information.

Milo Rossi YouTube Channel

11. History with Kayleigh: Hosted by Kayleigh Düring, this channel delves into ancient history, human evolution, and archaeological discoveries, providing insights into early civilizations and their structures.

History with Kayleigh YouTube Channel

These are channels that are run by individuals or small teams. Many of the national STEM societies also have YouTube channels but the NumberPhile channel has 4.62 million subscribers the American Mathematical Society has 6.79K subscribers. A standout for me is the American Chemical Society (82K subscribers) that has the ACSReactions channel with 546K subscribers that is an excellent example of digesting research for the general public. 

Reproducibility

I have been working with a bunch of software (Kyle Kristopher’s excellent TAASSC and TAALES) that have Graphical User Interfaces (GUIs).
GUIs are great for ease of use, but when it comes to reproducibility, they often fall short. On the other hand, Command Line Interfaces (CLIs) tend to be much better suited for workflows that need to be rerun consistently or when you would like to automate the running of several experiments consecutively. However, both types of tools could do better in ensuring that past work is easily repeatable.

The GUI Problem: Convenience vs. Control

GUI-based tools are often designed with interactivity and accessibility in mind. They allow users to adjust settings, click buttons, and visually inspect results. However, this interactivity often comes at the cost of transparency:

  1. Hidden State: Many GUI tools store configuration settings in memory or in obscure locations that are not easily accessible. Kyle’s tool preserves setting from run to run, which is very helpful. For other systems a user might tweak sliders and dropdown menus without having a clear record of what was actually used to generate a result.
  2. Reproducibility Issues: Since the process relies on manual input, it becomes difficult to repeat an exact run later unless the user painstakingly documents every step.
  3. Lack of Scriptability: Unlike CLI tools, which can be run as part of automated pipelines, GUI tools often require manual interaction, making them harder to integrate into systematic workflows.

One Fix: Exportable Run Configurations

One way GUI tools can improve their reproducibility is by allowing users to export a small, human-readable file (e.g., YAML, TOML) containing all relevant settings from a given run. This file could then be used to:
• Reload the exact same configuration later.
• Share a complete record of a run with colleagues.
• Allow the tool to be run in a batch mode using the exported file.

This would give users the convenience of a GUI while retaining the ability to revisit and modify previous work in a structured way.

Why CLI Wins in Reproducibility

CLI tools generally have a leg up when it comes to reproducibility. They allow users to specify explicit arguments, making it easier to track what parameters were used. Moreover it is easy to record the command that was issued. However, even CLI tools can fail in ensuring complete reproducibility unless they explicitly capture metadata.

Best Practices

The best tools should either:

  1. Embed Metadata in Output Files: Any file produced should contain a metadata section with all the parameters used in its creation. For instance, a text-based output could start with a block listing the command-line arguments, the software version, and any relevant hashes.or
  2. Generate a Standalone Config File: As an alternative, tools should produce a separate run configuration file (strict YAML, TOML, or JSON) that contains all relevant information about the run, including input files, parameters, and any dependency versions. The tool should accept this file back into the tool to rerun the process.

Typically these tools pull in external data files.

  1. Include Hashes: If possible, the tool should generate cryptographic hashes of input files and outputs to ensure integrity and detect modifications.

By embedding metadata in outputs or producing explicit config files, CLI tools can ensure that results are fully reproducible even when running across different environments.

Conclusion: Every Tool Should Capture Its Own Provenance

Whether a tool is GUI- or CLI-based, it should take responsibility for making its runs reproducible. This means allowing users to save structured records of their work in a way that can be reliably reloaded in the future. GUI tools should provide explicit export files, and CLI tools should ensure metadata is always captured—either in output files or via standalone configuration files.

By prioritizing reproducibility, we make research, data analysis, and engineering workflows more robust, shareable, and transparent.

Frontier Math Test Set

Understanding the Frontier Math Test Set: A Benchmark for Advanced Mathematical Reasoning

Introduction

Mathematics has long been a crucial domain for evaluating artificial intelligence (AI) capabilities, serving as a key indicator of reasoning, abstraction, and problem-solving skills. The Frontier Math Test Set has emerged as a benchmark designed to assess advanced mathematical reasoning in AI systems. Unlike standard test sets that focus on rote computation or well-structured problem-solving, the Frontier Math Test Set challenges models with complex, often open-ended mathematical problems that require deep understanding and innovative reasoning strategies.

Fields Medalist Timothy Gowers has commented on the exceptional difficulty of the problems in the Frontier Math Test Set, stating:

“[The questions I looked at] were all not really in my area and all looked like things I had no idea how to solve…they appear to be at a different level of difficulty from IMO problems.” (Epoch AI)

This test set made a huge spash with the announcement of OpenAI’s o3 model.

To see why look at this

where the state of the art is Gemini 1.5 Pro which scores 2.3%, and at the OpenAI claim of o3 performance

However, details are emerging that muddy the water, namely that OpenAI retains ownership of these questions and has access to the problems and solutions, with the exception of a holdout set.

Clarifying the Creation and Use of the FrontierMath Benchmark

Visit the official Frontier Math Test Set webpage

What Is the Frontier Math Test Set?

The Frontier Math Test Set is a collection of mathematical problems designed to test the limits of AI reasoning. The problems span multiple mathematical domains, including:

  • Algebra
  • Geometry
  • Number theory
  • Combinatorics
  • Calculus
  • Mathematical proofs

These problems are carefully curated to assess how well an AI model can go beyond pattern recognition and apply true problem-solving techniques similar to those used by human mathematicians. Unlike traditional datasets like MATH or GSM8K, the Frontier Math Test Set often includes problems that demand multi-step reasoning, implicit knowledge, and even creative insight.

How the Test Set Is Structured

The test set is structured to ensure a gradient of difficulty, starting with intermediate-level problems and scaling up to advanced mathematical challenges. Problems in the test set are typically classified into:

  • Routine Problems: Require standard techniques but may still be computationally intensive.
  • Non-Routine Problems: Demand novel approaches and reasoning beyond typical problem-solving heuristics.
  • Proof-Based Problems: Involve constructing logical arguments rather than finding a numerical answer.

Each problem is accompanied by a ground truth solution, often including step-by-step derivations to allow for better evaluation of AI reasoning pathways.

 

Academic Word List

The Academic Word List: A Foundational Resource for Academic English

The Academic Word List (AWL), developed by Averil Coxhead at Victoria University of Wellington, is a crucial tool for students, educators, and researchers seeking to enhance their proficiency in academic English. The AWL provides a structured vocabulary framework that bridges general English and specialized disciplinary language, making it a fundamental resource in higher education and English for Academic Purposes (EAP) instruction.

Composition and Structure of the Academic Word List

The AWL consists of 570 word families that appear frequently in academic texts but are not included in the most common 2,000 words of English. These words were identified through a comprehensive corpus-based analysis of 3.5 million words from academic texts across four broad disciplinary categories: Arts, Commerce, Law, and Science. The goal of the AWL is to provide learners with a lexicon that is applicable across disciplines, supporting both comprehension and production of academic discourse.

To facilitate learning and usage, the AWL is organized into 10 sublists ranked by frequency. The first sublist contains the most commonly occurring words in academic writing (e.g., analyze, concept, section), while the final sublist includes less frequent but still significant terms (e.g., adjacent, forthcoming, persistent). By structuring the list in this way, Coxhead’s work provides learners with a gradual and systematic approach to acquiring academic vocabulary.

Significance and Pedagogical Applications

The AWL has had a profound impact on academic writing, reading comprehension, and language instruction. For non-native English speakers, mastery of AWL vocabulary can significantly enhance their ability to engage with complex academic texts and articulate their ideas more precisely in written assignments. Additionally, for native English-speaking students, a strong command of AWL vocabulary contributes to the clarity and formality of academic writing.

The AWL is widely used in EAP programs, where it serves as a foundation for vocabulary instruction. It also informs curriculum development, assessment design, and self-directed learning strategies. For instance, learners preparing for high-stakes language proficiency exams such as IELTS and TOEFL benefit from explicit instruction in AWL vocabulary, as these words frequently appear in reading and writing sections of such assessments.

Potential Limitations

In their 2020 study, “Word lists and the role of academic vocabulary use in high stakes speaking assessments,” Smith, Kyle, and Crossley examined the effectiveness of various academic vocabulary lists, including the Academic Word List (AWL), in predicting performance on TOEFL speaking tasks. Their findings indicated only weak associations between the use of words from these lists and speaking scores, suggesting that the AWL and similar lists may not fully capture the lexical demands of academic speaking assessments. The authors recommend developing specialized vocabulary lists tailored to the specific requirements of academic speaking contexts.


https://www.wgtn.ac.nz/lals/resources/academicwordlist

Smith, G. F., Kyle, K., & Crossley, S. A. (2020). Word lists and the role of academic vocabulary use in high stakes speaking assessments. International Journal of Learner Corpus Research, 6(2), 193–219. https://doi.org/10.1075/ijlcr.20008.smi

Coxhead, A. (2000). A New Academic Word List. TESOL Quarterly, 34(2), 213–238. https://doi.org/10.2307/3587951

University of South Florida Word Association, Rhyme, and Word Fragment Norms

The University of South Florida (USF) Word Association, Rhyme, and Word Fragment Norms is a dataset compiled to study word relationships and cognitive processes in language. This database provides norms for word associations, rhyming patterns, and word fragment completion, making it a valuable resource for psycholinguistic research.

The database provides:

1. Word Association Norms

• Based on free word association tasks where participants were given a stimulus word and asked to provide the first word that came to mind.

• Provides forward associations (from cue to response) and backward associations (from response to cue), measuring how strongly words are linked in human cognition.

• Example: The stimulus “dog” might elicit responses like “cat” or “bark.”

• Contains approximately 5,000 stimulus words.

• Each word was presented to multiple participants (often 100+), resulting in over 72,000 unique word associations.

2. Rhyme Norms

• Includes lists of words that participants judge as rhyming with a given stimulus.

• Helps in phonological processing research and understanding how sound similarities influence memory and retrieval.

• Example: The word “hat” might elicit rhyming words like “bat” or “mat.”

3. Word Fragment Completion Norms

• Participants were given partial word fragments and asked to complete them.

• Used to study lexical access and retrieval processes.

• Example: Given “_ouse,” common completions might be “house” or “mouse.”

Relevance to Lexical Sophistication

The norms are a relevant measure Lexical Sophistication because they capture contextual distinctiveness, which assesses how diversely a word is used in different contexts .


Nelson, D. L., McEvoy, C. L., & Schreiber, T. A. (1998). The University of South Florida word association, rhyme, and word fragment norms. http://www.usf.edu/FreeAssociation/.

English Lexicon Project

 

The ELP (English Lexicon Project) is a large, publicly available psycholinguistic database that provides behavioral norms for word recognition. It includes data collected from lexical-decision and word-naming tasks performed by native English speakers. The database contains response latencies, accuracy rates, and other relevant linguistic information for a large set of words.

The database includes measures

Lexical-Decision Latencies: Measures the time it takes for participants to decide whether a given string of letters is a real English word. This is the so-called lexical decision task and often appears in the literature as LDT.

Word-Naming Latencies: In the speeded naming task, subjects are presented with a visual word (or sometimes a nonword) and are asked to name the word aloud as quickly and as accurately as possible.

Response Accuracy: Tracks how often participants correctly identify words in lexical-decision and word-naming tasks.

Coverage: Contains data for approximately 40,481 real words and an equal number of nonwords used in lexical-decision tasks.

Data Source: Collected from 816 native English-speaking subjects across different institutions.

The ELP is commonly used in psycholinguistics, cognitive science, and natural language processing to study word recognition, reading comprehension, and lexical access. It serves as a valuable resource for research on word frequency effects, lexical processing, and language development.


Balota, D. A., Cortese, M. J., Sergent-Marshall, S. D., Spieler, D. H., & Yap, M. J. (2004). Visual Word Recognition of Single-Syllable Words. Journal of Experimental Psychology: General, 133(2), 283–316. https://doi.org/10.1037/0096-3445.133.2.283

Operationalizing VACs

We use the TAASSC tool of Kristopher Kyle and hence inherit his operationalization of VAC. This is pretty straightforward, Kyle operationalizes a VAC in his thesis as

Verb argument constructions in TAASSC are defined as a main verb and all direct dependents that verb takes.

He notes that there are some limitations of this definition

Furthermore, the definition of a verb argument construction was largely determined based on the features analyzed by the Stanford Neural Network Dependency Parser (Chen & Manning, 2014). While this approach was straightforward and likely reduced error rates, it is possible that distinctions between VACs were made that were not appropriate. For example, a VAC (e.g., subject-verb-object) that includes a subordinating conjunction (i.e, because) was counted as a separate VAC type from its non-subordinated counterpart.

There are some important details hidden in the definition. It clearly depends on the dependency parse produced by the Stanford Neural Network Dependency Parser [we used version 3.5.1 – the last version that uses the SD tagset] but the Stanford Neural Network Dependency Parser produced three different dependency parses!

The representations are

1. Basic Dependencies [ type=”basic-dependencies”]

Basic dependencies provide a simple, straightforward representation of the syntactic structure of a sentence. Each word in the sentence is linked to its head word via a grammatical relation, forming a tree structure.

2. Collapsed Dependencies [ type=”collapsed-dependencies”]

Collapsed dependencies simplify the representation by collapsing certain prepositional and conjunctive relations into a more direct dependency, often improving downstream processing tasks. Specifically, it converts prepositional phrases into direct relations.

3. Collapsed and Propagated Dependencies [type=”collapsed-ccprocessed-dependencies”]

This representation extends the collapsed dependencies by further propagating conjunct dependencies. If a verb is connected to a conjunction (and, or), the dependency relations are duplicated for each conjunct.

Kyle uses the third of these to operationalize his VACs. He makes other modifications to the tree too before producing the definitive list of VACs. I hope to examine these in detail with examples. If this was done in Kyle’s thesis I can’t find it.

Explanations of the dependency parsing is in the Stanford Dependencies Manual.

Verb Argument Constructions

Together with my colleagues J. Elliott Casal and Christopher Stewart, I am submitting a paper looking at the use of VACs by Generative AI. We use the excellent TAASSC tool of Kristopher Kyle to capture these VACs. We used the TAASSC version 1.3.8 to extract VACs.

A Verb Argument Construction (VAC) refers to a linguistic structure in which a verb and its associated arguments (such as subject, object, and indirect object) form a conventionalized pattern that conveys a specific meaning. The concept of VACs is rooted in Construction Grammar (Goldberg, 1995, 2006), which posits that grammatical constructions—combinations of words with particular syntactic and semantic properties—are fundamental units of language.

VACs are important because they reveal how speakers of a language systematically associate verb meanings with particular syntactic structures. For example, the ditransitive construction (“X gives Y Z”) inherently expresses transfer, regardless of the verb used (e.g., “give,” “send,” “tell”). Likewise, the caused-motion construction (“X causes Y to move Z”) conveys movement (e.g., “throw the ball into the box,” “push the chair out the door”). These constructions shape how verbs function in different syntactic frames, impacting language acquisition, processing, and variation (Ellis & Ferreira-Junior, 2009).

Importance of VACs in Linguistics:

1. Language Learning and Acquisition

VACs play a crucial role in first- and second-language acquisition. Research shows that learners acquire constructions holistically before abstracting verb-specific rules (Ellis, 2002). This supports the usage-based model of language learning, where exposure to frequent constructions facilitates acquisition.

2. Syntax-Semantics Interface

VACs bridge syntax and semantics by encoding meaning through syntactic structures. Goldberg (2006) argues that meaning is not solely derived from individual verbs but also from the construction itself, meaning that verbs “inherit” meaning from the VACs they appear in.

3. Computational and Corpus Linguistics

VACs have been widely studied in corpus linguistics and natural language processing (NLP) for verb classification, semantic role labeling, and machine translation (Stefanowitsch & Gries, 2003). By analyzing large corpora, researchers can identify probabilistic patterns in verb usage.

4. Cross-Linguistic Comparisons

VACs vary across languages, making them important for typological studies. Some languages rely more on argument structure constructions than verb morphology to express meaning (Levin, 1993). Understanding these variations informs linguistic theory and translation studies.

Key References:

• Ellis, N. C. (2002). Frequency effects in language processing: A review with implications for theories of implicit and explicit language acquisition. Studies in Second Language Acquisition, 24(2), 143–188.

• Ellis, N. C., & Ferreira-Junior, F. (2009). Construction learning as a function of frequency, frequency distribution, and function. The Modern Language Journal, 93(3), 370–385.

• Goldberg, A. E. (1995). Constructions: A Construction Grammar Approach to Argument Structure. University of Chicago Press.

• Goldberg, A. E. (2006). Constructions at Work: The Nature of Generalization in Language. Oxford University Press.

• Levin, B. (1993). English Verb Classes and Alternations: A Preliminary Investigation. University of Chicago Press.

• Stefanowitsch, A., & Gries, S. T. (2003). Collostructions: Investigating the interaction of words and constructions. International Journal of Corpus Linguistics, 8(2), 209–243.

Testing Academic Prototypes

Most academic software projects that I work with are very different from code developed by professional developers. There is frequently no unit testing. I work with projects that fall into three broad classes:

  1. local programs.
  2. web apps that are light clients that do almost nothing except call an API.
  3. web apps that do extensive client side work and that interface with external databases.

Testing is normally done by live testers. This is useful for ensuring that the system runs and identify UX issues but testers may not identify issues where the code functions but delivers results which are not entirely correct. Live testing is also not easily replicable. Absent unit testing we are frequently faced with doing end-to-end testing. Even with unit tests there are several reasons to try to build an end-to-end test suite. An end-to-end test suite in its simplest form is a collection of inputs and expected outputs. For testing CLIs and APIs this is normally just a spreadsheet of input parameters and expected output. For GUI apps and web apps this means a list of actions within the app and then an expected output and will require a testing framework (such as selenium or BrowserStack) . There are several reasons to build such a test suite

  1. there is unlikely to be a proper software requirements specification for an academic project. A end-to-end test suite explains what the researcher understands as correct functioning. As the “specifications” evolve we add new test lines that capture the new functioning and revise the old ones as necessary.
  2. as with classic test suites, as we make maintenance or performance changes to existing code we can check that the code continues to function as expected.
  3. these test suites serve as excellent performance test suites for profiling tools.

 

RAG Time

RAG Time

The world of language models is abuzz with the latest trend: Retrieval-Augmented Generation (RAG). But what exactly is it, and why is it generating such excitement? In a nutshell, RAG models take the traditional approach of language models to a new level by consulting external knowledge sources before generating text. This allows them to produce more factually accurate and nuanced responses, even for complex or open-ended questions and to cite their sources.

Think of it like this: imagine a student preparing for a test. They can memorize facts (like traditional language models), but a truly insightful answer might require looking things up in a textbook or online (like RAG models do). This ability to access and integrate external information is what makes RAG models stand out.

One company leading the charge in this field is Perplexity, a startup boasting a staggering 10 million monthly users. Their RAG-powered platform allows users to ask open ended questions and receive informative, well-researched responses. Whether you’re curious about the latest scientific discoveries or seeking historical insights, Perplexity aims to provide answers that go beyond simple factoids.

But why is RAG causing such a stir? Here are some key reasons:

  • Improved Factual Accuracy: By referencing external sources, RAG models can avoid the pitfalls of traditional language models, which can sometimes generate factually incorrect or misleading information.
  • Enhanced Creativity: Access to a wider range of information allows RAG models to explore more creative and nuanced responses, making them more engaging and informative.
  • Greater Adaptability: The ability to learn from new information sources makes RAG models more adaptable to different contexts and situations, paving the way for personalized and dynamic interactions.

The future of language models is undoubtedly shaped by advancements like RAG. As these models continue to learn and evolve, we can expect even more exciting developments that bridge the gap between human and machine intelligence. So, stay tuned, because the way we interact with language and information is about to undergo a fascinating transformation!

Further Reading:

1 2