Jakub Sobolewski — R & Shiny, Test-Driven Software Development

I'm Jakub Sobolewski.
I build the open-source tools
and write the playbooks that define how R and Shiny teams test.

15K
CRAN downloads: cucumber + muttest; 55
GitHub stars: on my testing packages; 58
Articles published: on testing R & Shiny; 6
R Weekly highlights: featured in the highlights of the newsletter

New on the blog

11 Test Smells That Make Your Tests Lie to You

Learn to recognize problems in R test code that cause your test suite to pass while hiding real bugs. Detect those issues and start writing more trustworthy tests.

Jun 2, 2026

rtests

22 min read

Behavior-Driven Development in R Shiny: Asserting Outcomes with Then Steps

Learn how to write Then steps that assert outcomes without coupling to implementation. Build custom testthat expectations and keep your BDD assertions at the right level.

May 29, 2026

bddrshinyshinytest2tests

7 min read

Behavior-Driven Development in R Shiny: Modeling User Behavior with When Steps

Learn how to write When steps that describe user actions without leaking implementation details. Build a clean DSL that survives UI refactors and keeps specifications readable.

May 26, 2026

bddrshinyshinytest2tests

11 min read

I'm Jakub Sobolewski

Software engineer specializing in R and Shiny, currently at Appsilon. I created Cucumber and muttest for R, and I've taught R testing at ShinyConf 2024 and useR! 2025. 5+ years testing production code.

My conviction: automated testing is how teams build software they can trust. I started caring about this after working on a project where every code change required a live connection to production. Terrible experience, fixable problem.

Tests should make development faster and more confident, not slower and more ceremonial. I write about finding strategies that actually fit the way real teams work.

My Open Source

Cucumber

Write behavior specifications in plain English, implement them as R functions, and run them as tests. Requirements stay in sync with the code because they are the code.

31 GitHub stars 8,615 CRAN downloads

✗ comments as specs

Comments encode intent but not procedure. They don't separate precondition from action from outcome. They can't be run, so they drift. A stale comment is worse than no comment.

test_that("sales trend works for Electronics", {
  # load the sales data
  # get trend for "Electronics"
  # make sure it looks right
  data <- load_sales_data()
  result <- get_sales_trend(data, "Electronics")
  expect_s3_class(result, "ggplot")
})

Comments and tickets

Executable specifications

✓ the specification

Gherkin forces you to think in procedure: what state is required, what action is taken, what outcome is observable. Vague intent doesn't survive the structure.

Feature: Sales Trends

  Scenario: User views trend for a category
    Given the sales data is loaded
    When the user views the trend for "Electronics"
    Then the sales trend plot for "Electronics" is shown

✓ the implementation

Each line of the spec maps to one R function. The English phrase becomes the function signature: the same words, now executable.

given("the sales data is loaded", function(context) { 
  context$data <- load_sales_data() 
}) 

when("the user views the trend for {string}", function(category, context) { 
  context$plot <- get_sales_trend(context$data, category) 
}) 

then("the sales trend plot for {string} is shown", function(category, context) { 
  expect_s3_class(context$plot, "ggplot") 
  expect_equal(context$plot$labels$title, category) 
})

✓ verification

> cucumber::test()
#> ✔ | F W  S  OK | Context
#> ✔ |          1 | Feature: Sales Trends
#>
#> ══ Results ═══════════════════════════════════════════════════
#> [ FAIL 0 | WARN 0 | SKIP 0 | PASS 1 ]

Muttest

Mutation testing for R. Introduces small changes to your source code and checks whether your tests catch them. Reveals gaps that code coverage misses.

24 GitHub stars 6,361 CRAN downloads

the code

A simple boundary check. Two tests cover adults and minors, but never the edge.

# R/is_adult.R
is_adult <- function(age) {
  age >= 18
}

mutation score

0%

The > 18 mutant survived. Boundary value 18 is never tested. Your suite can't tell >= from >.

✗ the tests

test_that("is_adult returns TRUE for adults", {
  expect_true(is_adult(25))  # passes even with age > 18
})

test_that("is_adult returns FALSE for minors", {
  expect_false(is_adult(10))  # passes even with age > 18
})

run mutation testing

> muttest::muttest(plan)
#> ℹ Mutation Testing
#>   |   K |   S |   E |   T |   % | Mutator  | File
#> ✔ |   1 |   0 |   0 |   1 | 100 | >= → <=  | is_adult.R
#> x |   1 |   1 |   0 |   2 |  50 | >= → >   | is_adult.R
#>
#> ── Survived Mutants ─────────────────────────────────────────────────────────
#> is_adult.R  >= → >
#>   2-   age >= 18
#>   2+   age > 18
#>
#> ── Results ──────────────────────────────────────────────────────────────────
#> [ KILLED 1 | SURVIVED 1 | ERRORS 0 | TOTAL 2 | SCORE 50.0% ]

after the fix

0%

Every mutation triggers a failure. Adding the boundary test kills the survivor.

✓ the fix

test_that("is_adult returns TRUE for adults", {
  expect_true(is_adult(25))
})

test_that("is_adult returns FALSE for minors", {
  expect_false(is_adult(10))
})

test_that("is_adult is TRUE at the boundary", { 
  expect_true(is_adult(18))  # kills >= → > #
})

Tutorials

Shiny Acceptance TDD

Build Shiny apps from the outside in. Write acceptance tests first, then let them drive every design decision down to the module level.

what you'll learn

✓Transform user stories into runnable tests
✓Build a DSL that hides UI details from specs
✓Keep tests green as the UI evolves
✓Structure Shiny modules for testability

✗ vague requirements

Stories written in prose stay prose. They can't be run, so nobody knows when the app actually satisfies them. Requirements drift the moment code ships.

Budget tracking
  As a user I want to see my net balance
  so that I can understand my financial situation.

  Acceptance: shows income, expenses, and net.
  // ← lives in a doc, never executed

✓ executable specification

The same scenario becomes a test. Given-When-Then forces you to name preconditions, actions, and outcomes. When it passes, the feature is done.

# tests/acceptance/test-budget.R
test_that("Scenario: I can inspect my net balance", {
  # Given
  dsl$record_income(2000)
  dsl$record_expense(500)
  # When
  dsl$inspect_finances()
  # Then
  dsl$verify_total_income(2000)
  dsl$verify_total_expenses(500)
  dsl$verify_net_balance(1500)
  dsl$teardown()
})

Shiny Test-Driven Development

ShinyConf 2024. A structured approach to testing Shiny apps: inside-out unit tests, outside-in acceptance tests, and the loop that connects them.

what you'll learn

✓Inside-out vs. outside-in strategies
✓Automate acceptance criteria
✓Isolate and test Shiny modules
✓Inject fake dependencies

Behavior-Driven Development

useR! 2025. From vague wish to working code: how to cooperate with stakeholders, write Gherkin scenarios, and execute them with Cucumber for R.

what you'll learn

✓BDD fundamentals and why they work
✓Given-When-Then scenario structure
✓Run specs with Cucumber for R
✓Align code with business language

Building Quality By Testing

New on the blog

11 Test Smells That Make Your Tests Lie to You

Behavior-Driven Development in R Shiny: Asserting Outcomes with Then Steps

Behavior-Driven Development in R Shiny: Modeling User Behavior with When Steps

I'm Jakub Sobolewski

My Open Source

Cucumber

Muttest

Tutorials

Shiny Acceptance TDD

Shiny Test-Driven Development

Behavior-Driven Development