Skip to contents

The reads from one cell is guaranteed to only be present in a single shard. This makes parallel processing simple as each shard can be processed on a separate computer. Using more shards means that more computers can process the data in parallel. However, if you perform all the calculations on a single computer, having more than one shard will not result in a speedup. This option is only relevant when using a cluster of compute nodes.

Usage

BascetShardify(
  debstat,
  numOutputShards = 1,
  outputName = "filtered",
  overwrite = FALSE,
  numThreads = NULL,
  numWriterThreads = NULL,
  totalMem = NULL,
  streamArenaMem = NULL,
  streamBufferSize = NULL,
  runner = GetDefaultBascetRunner(),
  bascetInstance = GetDefaultBascetInstance()
)

Arguments

debstat

Plan for sharding provided by PrepareSharding

numOutputShards

How many shards to generate /for each input prefix/

outputName

Name of the output file: Properly sharded debarcoded reads

overwrite

Force overwriting of existing files. The default is to do nothing files exist

numThreads

Number of threads to use per job. Default is the number from the runner

numWriterThreads

Advanced settings: Number of writer threads to use per job

totalMem

How much memory to use. Extracted from runner if set

streamArenaMem

Advanced settings: How much memory to use for streaming arena (fraction, given as e.g. "10%")

runner

The job manager, specifying how the command will be run (e.g. locally, or via SLURM)

bascetInstance

A Bascet instance

Value

A runner job (details depends on runner)