On-Demand (Lazy) Inputs

Salsa inputs work best if you can easily provide all of the inputs upfront. However sometimes the set of inputs is not known beforehand.

A typical example is reading files from disk. While it is possible to eagerly scan a particular directory and create an in-memory file tree as salsa input structs, a more straight-forward approach is to read the files lazily. That is, when a query requests the text of a file for the first time:

  1. Read the file from disk and cache it.
  2. Setup a file-system watcher for this path.
  3. Update the cached file when the watcher sends a change notification.

This is possible to achieve in salsa, by caching the inputs in your database structs and adding a method to the database trait to retrieve them out of this cache.

A complete, runnable file-watching example can be found in the lazy-input example.

The setup looks roughly like this:

#[salsa::input]
struct File {
    path: PathBuf,
    #[return_ref]
    contents: String,
}

#[salsa::db]
trait Db: salsa::Database {
    fn input(&self, path: PathBuf) -> Result<File>;
}

#[salsa::db]
#[derive(Clone)]
struct LazyInputDatabase {
    storage: Storage<Self>,
    logs: Arc<Mutex<Vec<String>>>,
    files: DashMap<PathBuf, File>,
    file_watcher: Arc<Mutex<Debouncer<RecommendedWatcher>>>,
}

impl LazyInputDatabase {
    fn new(tx: Sender<DebounceEventResult>) -> Self {
        Self {
            storage: Default::default(),
            logs: Default::default(),
            files: DashMap::new(),
            file_watcher: Arc::new(Mutex::new(
                new_debouncer(Duration::from_secs(1), tx).unwrap(),
            )),
        }
    }
}

#[salsa::db]
impl salsa::Database for LazyInputDatabase {
    fn salsa_event(&self, event: &dyn Fn() -> salsa::Event) {
        // don't log boring events
        let event = event();
        if let salsa::EventKind::WillExecute { .. } = event.kind {
            self.logs.lock().unwrap().push(format!("{:?}", event));
        }
    }
}

#[salsa::db]
impl Db for LazyInputDatabase {
    fn input(&self, path: PathBuf) -> Result<File> {
        let path = path
            .canonicalize()
            .wrap_err_with(|| format!("Failed to read {}", path.display()))?;
        Ok(match self.files.entry(path.clone()) {
            // If the file already exists in our cache then just return it.
            Entry::Occupied(entry) => *entry.get(),
            // If we haven't read this file yet set up the watch, read the
            // contents, store it in the cache, and return it.
            Entry::Vacant(entry) => {
                // Set up the watch before reading the contents to try to avoid
                // race conditions.
                let watcher = &mut *self.file_watcher.lock().unwrap();
                watcher
                    .watcher()
                    .watch(&path, RecursiveMode::NonRecursive)
                    .unwrap();
                let contents = std::fs::read_to_string(&path)
                    .wrap_err_with(|| format!("Failed to read {}", path.display()))?;
                *entry.insert(File::new(self, path, contents))
            }
        })
    }
}
  • We declare a method on the Db trait that gives us a File input on-demand (it only requires a &dyn Db not a &mut dyn Db).
  • There should only be one input struct per file, so we implement that method using a cache (DashMap is like a RwLock<HashMap>).

The driving code that's doing the top-level queries is then in charge of updating the file contents when a file-change notification arrives. It does this by updating the Salsa input in the same way that you would update any other input.

Here we implement a simple driving loop, that recompiles the code whenever a file changes. You can use the logs to check that only the queries that could have changed are re-evaluated.

fn main() -> Result<()> {
    // Create the channel to receive file change events.
    let (tx, rx) = unbounded();
    let mut db = LazyInputDatabase::new(tx);

    let initial_file_path = std::env::args_os()
        .nth(1)
        .ok_or_else(|| eyre!("Usage: ./lazy-input <input-file>"))?;

    // Create the initial input using the input method so that changes to it
    // will be watched like the other files.
    let initial = db.input(initial_file_path.into())?;
    loop {
        // Compile the code starting at the provided input, this will read other
        // needed files using the on-demand mechanism.
        let sum = compile(&db, initial);
        let diagnostics = compile::accumulated::<Diagnostic>(&db, initial);
        if diagnostics.is_empty() {
            println!("Sum is: {}", sum);
        } else {
            for diagnostic in diagnostics {
                println!("{}", diagnostic.0);
            }
        }

        for log in db.logs.lock().unwrap().drain(..) {
            eprintln!("{}", log);
        }

        // Wait for file change events, the output can't change unless the
        // inputs change.
        for event in rx.recv()?.unwrap() {
            let path = event.path.canonicalize().wrap_err_with(|| {
                format!("Failed to canonicalize path {}", event.path.display())
            })?;
            let file = match db.files.get(&path) {
                Some(file) => *file,
                None => continue,
            };
            // `path` has changed, so read it and update the contents to match.
            // This creates a new revision and causes the incremental algorithm
            // to kick in, just like any other update to a salsa input.
            let contents = std::fs::read_to_string(path)
                .wrap_err_with(|| format!("Failed to read file {}", event.path.display()))?;
            file.set_contents(&mut db).to(contents);
        }
    }
}