Skip to main content

Syntasa Notebook Utilities - User Reference

synutils is the unified entry point in every notebook — a Python attribute and a Scala REPL alias. Each section below lists user-callable methods with examples in both languages.

connections — connection metadata + decrypted parameters

MethodPurpose
get(name)Full connection object (parameters auto-decrypted; encrypted fields wrapped in SecretString)
getParam(name, param)Single parameter value as str / String (see redaction note below)
getParamSealed(name, param)(Scala only)Single parameter value as SecretString. Always wrapped — host / port will redact too. Use for .get() / .unseal() ergonomics.
getAllParams(name)All parameters — preserves SecretString wrapping for encrypted fields
clearCache()Drop cached responses

Python

conn = synutils.connections.get("my_snowflake")
host = synutils.connections.getParam("my_snowflake", "host")
# Drop cached responses (force a re-fetch on next call)
synutils.connections.clearCache()

Scala

val conn = synutils.connections.get("my_snowflake") 
val host = synutils.connections.getParam("my_snowflake", "host")

// Drop cached responses (force a re-fetch on next call)
synutils.connections.clearCache()

Details

Redaction — encrypted fields render as **********

After decryption, encrypted parameter fields (passwords, API keys, secret keys, private keys — whichever fields apply per connection type) are wrapped in SecretString. They redact in print/log output but still work for SDK calls. Non-encrypted parameters (host, port, database, etc.) are returned as plain strings.

Python

SecretString is a str subclass; it redacts when printed and reveals when you call .get() / .unseal() (or, if a SDK accepts a str subclass directly, you can pass it as-is).

conn = synutils.connections.get("my_snowflake") 
print(conn["parameters"])
# {'host': 'snow.example.com',
# 'database': 'analytics',
# 'username': 'svc_account',
# 'password': '**********',
# 'privateKey': '**********'}
# Plain field — no wrapping
print(conn["parameters"]["host"]) # snow.example.com
# Encrypted field — redacts in print, reveals via .get()
pw = conn["parameters"]["password"]
print(pw) # **********
print(f"pw={pw}") # pw=**********
print(pw.get()) # actual password
# Pass to a SDK (works because SecretString is a str subclass — but
# many SDKs call str() internally, which returns "**********". For those,
# call .get() explicitly to be safe.)
import snowflake.connector
conn_handle = snowflake.connector.connect(
user=conn["parameters"]["username"],
password=conn["parameters"]["password"].get(), # <- unseal for SDK
account=conn["parameters"]["account"],
)

Scala

SecretString has an implicit conversion to String, so JDBC and most SDKs work directly. Use .get() / .unseal() for explicit reveal.

val conn = synutils.connections.get("my_snowflake") 
val params = conn("parameters").asInstanceOf[Map[String, Any]]

println(params)
// Map(host -> snow.example.com, database -> analytics,
// username -> svc_account, password -> **********, privateKey -> **********)

// Encrypted field — pattern-match for type-safe access
val pwd = params("password").asInstanceOf[SecretString]
println(pwd) // **********
println(pwd.get()) // actual password

// JDBC — implicit conversion fires automatically (no explicit unseal needed)
import java.sql.DriverManager
DriverManager.getConnection(jdbcUrl, "svc_account", pwd)

// String interpolation — type ascription forces the conversion
val safeUrl = s"jdbc:postgresql://host/${pwd: String}"

datasets — dataset registry + Hive table metadata

MethodPurpose
get(datasetName)DataSet object (table name, partition cols, etc.)
create(datasetName, fileFormat=PARQUET)Register a new dataset
list(eventStoreName)All datasets under an event store (devDatasets + prodDatasets combined). Each entry has an environment field — filter on that if you only want one env.

DataSet object methods: tableName(), getPartitionColumns(), getNonPartitionColumns(), isPartitioned().

Python

# Fetch 
ds = synutils.datasets.get("user_events")
print(ds.tableName(), ds.isPartitioned())
# Register a new dataset (default file format is PARQUET)
from synutils.file_format import FileFormat
synutils.datasets.create("my_new_dataset")
synutils.datasets.create("my_avro_dataset", fileFormat=FileFormat.AVRO)
# List all datasets registered under an event store
# (devDatasets + prodDatasets combined; filter by 'environment' for one env)
for d in synutils.datasets.list("TestStore"):
print(d["name"], d["environment"], d["database"])
# Only PRODUCTION datasets
prod = [d for d in synutils.datasets.list("TestStore") if d["environment"] == "PRODUCTION"]

Scala

// Fetch 
val ds = synutils.datasets.get("user_events")
println(s"${ds.tableName} ${ds.isPartitioned}")

// Register a new dataset (default file format is "parquet")
synutils.datasets.create("my_new_dataset")
synutils.datasets.create("my_avro_dataset", fileFormat = "avro")

// List all datasets registered under an event store (dev + prod combined)
synutils.datasets.list("TestStore").foreach { d =>
println(s"${d("name")} ${d("environment")} ${d("database")}")
}

// Only PRODUCTION datasets
val prod = synutils.datasets.list("TestStore")
.filter(_("environment") == "PRODUCTION")

infrastructure — platform infra metadata

The full method surface, side by side. Use the canonical method in new code. Legacy aliases exist only for backward compatibility with pre-integration notebooks.

Properties (read-only attributes)

ReturnsPythonScalaLegacy (Python)
Cloud provider — "AWS" / "GOOGLE" / "AZURE"providerTypeproviderTypeget_provider_type_from_metadata()
Cloud regionregionregion
GCP project ID (empty for AWS/Azure)projectIdprojectId
Default storage bucketbucketbucket
Filesystem prefix (s3://, gs://, abfs://)fileSystemPrefixfileSystemPrefix
Full storage path (prefix + bucket)storagePathstoragePath
SSH connection typesshTypesshType
Metastore type (AWS_GLUE / HIVE / BIGQUERY)metastoreTypemetastoreType
Metastore hostnamemetastoreHostnamemetastoreHostname

Section getters

ReturnsPythonScalaLegacy (Python)
config section — region, projectId, etc.getConfig()getConfig()get_config_from_metadata()
storage section — bucket, pathsgetStorage()getStorage()get_storage_from_metadata()
network sectiongetNetwork()getNetwork()get_network_from_metadata()
metastore sectiongetMetastore()getMetastore()get_metastore_from_metadata()
security section — cloud creds, SSHgetSecurity()getSecurity()get_security_from_metadata()
Full payload (all sections)getAll()getAll()get_all_from_metadata(), asDict(), get_infrasturcture_json(token=None)
Global init script (Optional[str] / Option[String])getGlobalInitScript()getGlobalInitScript()get_global_init_script_from_metadata()
Drop the cached payloadclearCache()clearCache()

Generic accessors (backward-compat with legacy InfrastructureClient)

ReturnsPythonScalaLegacy (Python)
Any infraProvider section by nameget(resource)(use getAll()("infraProvider")(resource))get(token, resource)
Same — aliasget_from_metadata(resource)get_from_metadata(resource)
Full payload (legacy typo preserved)get_infrasturcture_json(token=None)get_infrasturcture_json(token)

The token parameter is accepted on the legacy aliases but ignored — auth is handled by the underlying ApiClient at construction. Sensitive values under security / metastore come back as SecretString (redact on print, work as plain str with SDKs).

# Generic — fetch any section dynamically
synutils.infrastructure.get("storage") # plain dict
synutils.infrastructure.get("notebookConfig") # {"script": "..."}
synutils.infrastructure.get("security") # values redact when printed
# Legacy aliases
synutils.infrastructure.get_from_metadata("storage")
synutils.infrastructure.get_infrasturcture_json() # full nested payload

Python returns dict everywhere; Scala returns Map[String, AnyRef]. Same shape, language-native types.

Secret fields (encrypted + sealed)

Encrypted fields are auto-decrypted by the platform and wrapped in SecretString so they redact in print / log output. Pull a value out of the dict and call .get() / .unseal() to reveal the decrypted plaintext.

SectionSecret field names
metastoremetastorePassword
securitykeyFile, accessKey, secretKey, sshUserPass
bigQuerykeyFile
bigTablekeyFile
redshiftpassword
kafkasslCertificate, sslKey
gitConfigpersonalAccessToken, oauthToken, sshKey

If decryption fails (no key, malformed ciphertext) the original ciphertext is preserved and a warning is logged — print(meta["metastorePassword"]) still redacts to **********, but .get() returns the raw ciphertext rather than the plaintext.

The seven subsections above are not exposed as individual section getters; access them via getAll() / asDict() / get_infrasturcture_json() or via the existing getMetastore() / getSecurity() for those two.

Python

print(synutils.infrastructure.providerType, synutils.infrastructure.bucket) 
print(synutils.infrastructure.getStorage()) # full storage dict

sec = synutils.infrastructure.getSecurity()
print(sec) # {'accessKey': '**********', 'secretKey': '**********', ...}
sec["secretKey"].get() # decrypted plaintext

meta = synutils.infrastructure.getMetastore()
meta["metastorePassword"].get() # decrypted plaintext

payload = synutils.infrastructure.getAll()
payload["infraProvider"]["security"]["secretKey"].get() # decrypted
payload["infraProvider"]["metastore"]["metastorePassword"].get() # decrypted
payload["infraProvider"]["bigQuery"]["keyFile"].get() # decrypted JSON
payload["infraProvider"]["redshift"]["password"].get() # decrypted
payload["infraProvider"]["kafka"]["sslKey"].get() # decrypted
payload["infraProvider"]["gitConfig"]["personalAccessToken"].get() # decrypted

Scala

SecretString is available as a bare name in notebook Scala sessions.

println(s"${synutils.infrastructure.providerType} ${synutils.infrastructure.bucket}") 
println(synutils.infrastructure.getStorage()) // full storage Map

val sec = synutils.infrastructure.getSecurity()
println(sec) // values render as **********

// Map[String, AnyRef] preserves the seal — values aren't auto-unsealed.
// Cast to SecretString to access .get() / .unseal().
val secretKey = sec("secretKey").asInstanceOf[SecretString]
println(secretKey) // **********
println(secretKey.get()) // decrypted plaintext

val pwd = synutils.infrastructure.getMetastore()("metastorePassword")
.asInstanceOf[SecretString]
val raw: String = pwd // implicit conversion reveals (decrypted)

eventstores — event store path / database resolver

An event store is the primary data storage unit on the platform — it pairs a cloud storage path with a metastore database, with separate values per environment (development / production / snapshot).

MethodPurpose
get(value, lookupType="name", env="development")Full event store object with resolved path / database for the requested env
getPath(name, lookupType="name", env="development")Storage path only
getDatabase(name, lookupType="name", env="development")Hive database name only
configure(defaultName=…, defaultEnv=…)Set a default event store + environment so the path / database / name properties resolve without arguments
path, database, nameProperties that return the path / database / name of the default event store (requires configure(...))
clearCache()Drop the response cache

lookupType selects how value is interpreted:

  • "name"value is the event store's logical name; env decides which environment's path/database to return.
  • "database"value is a database name; the environment is auto-detected by matching against developmentDatabase / productionDatabase in the response (the env argument is ignored).

Examples

Python

# 1. Lookup by name + environment 
es = synutils.eventstores.get("click_stream", env="production")
print(es["path"], es["database"], es["_resolved_env"])
# gs://.../click_stream click_stream_prod production
# 2. Lookup by database — env auto-detected from the database value
es = synutils.eventstores.get("click_stream_dev", lookupType="database")
print(es["_resolved_env"]) # development
# 3. Just need the path or database
synutils.eventstores.getPath("click_stream", env="development")
synutils.eventstores.getDatabase("click_stream", env="production")
# 4. Configure defaults — then use path / database / name as bare properties
synutils.eventstores.configure(defaultName="click_stream", defaultEnv="development")
print(synutils.eventstores.path)
print(synutils.eventstores.database)
print(synutils.eventstores.name)
# 5. Switch the default environment without changing the name
synutils.eventstores.configure(defaultEnv="production")
print(synutils.eventstores.path)
# 6. Drop cached responses (force a re-fetch on next call)
synutils.eventstores.clearCache()

Scala

// 1. Lookup by name + environment 
val es = synutils.eventstores.get("click_stream", env = "production")
println(s"${es("path")} ${es("database")} ${es("_resolved_env")}")

// 2. Lookup by database — env auto-detected
val byDb = synutils.eventstores.get("click_stream_dev", lookupType = "database")
println(byDb("_resolved_env")) // development

// 3. Just need the path or database
synutils.eventstores.getPath("click_stream", env = "development")
synutils.eventstores.getDatabase("click_stream", env = "production")

// 4. Configure defaults
synutils.eventstores.configure(defaultName = "click_stream", defaultEnv = "development")
println(synutils.eventstores.path)
println(synutils.eventstores.database)
println(synutils.eventstores.name)

// 5. Switch default environment
synutils.eventstores.configure(defaultEnv = "production")
println(synutils.eventstores.path)

// 6. Drop cached responses
synutils.eventstores.clearCache()

notifications — send platform notifications (email)

MethodPurpose
send(recipients, subject, message, type="EMAIL", attachments=…, useDefaultHtmlTemplate=True)Send a notification

recipients is a single email or comma-separated list (e.g. "a@x.com, b@y.com"). Set useDefaultHtmlTemplate=False to send the message body verbatim (skip the server's default HTML wrapper).

Python

# 1. Plain file path — simplest form 
synutils.notifications.send(
recipients="a@x.com, b@y.com",
subject="Daily report",
message="See attached",
attachments=["/tmp/report.pdf"],
)
# 2. In-memory bytes via (filename, bytes) tuple
csv_bytes = b"id,name
1,alice
"
synutils.notifications.send(
recipients="a@x.com",
subject="Daily export",
message="Attached CSV",
attachments=[("daily.csv", csv_bytes)],
)
# 3. In-memory bytes with explicit content type — (filename, bytes, mimetype)
import json
data = {"ok": True}
synutils.notifications.send(
recipients="a@x.com",
subject="Run status",
message="See attached JSON",
attachments=[("run_status.json", json.dumps(data).encode(), "application/json")],
)
# 4. Mix of file path and in-memory content
synutils.notifications.send(
recipients="a@x.com",
subject="Job done",
message="Logs + summary",
attachments=[
"/var/log/job.log", # path
("summary.csv", b"job,seconds
foo,42
"), # bytes
],
)
# 5. File-like object (e.g. io.BytesIO) also works as the second tuple element
import io
buf = io.BytesIO()
buf.write(b"col_a,col_b
1,2
") s
ynutils.notifications.send(
recipients="a@x.com",
subject="Buffered results",
message="Attached",
attachments=[("results.csv", buf.getvalue())],
)

Scala

// 1. Plain file path — simplest form (matches Python) 
synutils.notifications.send(
recipients = "a@x.com, b@y.com",
subject = "Daily report",
message = "See attached",
attachments = Seq("/tmp/report.pdf")
)

// 2. In-memory bytes via tuple
val csvBytes = "id,name
1,alice
".getBytes("UTF-8")
synutils.notifications.send(
recipients = "a@x.com",
subject = "Daily export",
message = "Attached CSV",
attachments = Seq(("daily.csv", csvBytes, "text/csv"))
)

// 3. Mix of file path and in-memory content
synutils.notifications.send(
recipients = "a@x.com",
subject = "Job done",
message = "Logs + summary",
attachments = Seq(
"/var/log/job.log",
("summary.txt", "Job completed in 42s".getBytes("UTF-8"), "text/plain")
)
)

// 4. Pre-built Attachment objects also still work
synutils.notifications.send(
recipients = "a@x.com",
subject = "x", message = "y",
attachments = Seq(Attachment.fromFile("/tmp/file.csv"))
)

credentials — secret store lookup

MethodPurposeLegacy
get(name, key)Fetch a single secret value — returns SecretStringread(name, key)
getAll(name)All secrets under a name — returns SecretDict (Python) / SecretMap (Scala)read(name)
describe(name)Credential description
getMetadata(name)Raw metadata
list()All credentials

Basic usage

# Python

pwd = synutils.credentials.get("svc-account-a", "password")
all_secrets = synutils.credentials.getAll("svc-account-a")
desc = synutils.credentials.describe("svc-account-a")
print(synutils.credentials.list())
# Backward-compat alias — read() resolves to get()/getAll()
pwd = synutils.credentials.read("svc-account-a", key="password")
all_secrets = synutils.credentials.read("svc-account-a")
// Scala 

val pwd = synutils.credentials.get("svc-account-a", "password")
val all = synutils.credentials.getAll("svc-account-a")
val desc = synutils.credentials.describe("svc-account-a")
println(synutils.credentials.list())

// Backward-compat alias — read() resolves to get()/getAll()
val pwd2 = synutils.credentials.read("svc-account-a", "password")
val all2 = synutils.credentials.read("svc-account-a")

Details

Redaction — values print as ********** by default

get() returns a SecretString; getAll() returns a SecretDict / SecretMap. Both override their string representation so the raw value never leaks into notebook output, logs, or exceptions.

Python

secret = synutils.credentials.get("svc-account-a", "password") 
# All of these print **********
print(secret) # **********
repr(secret) # **********
f"connecting with pw={secret}" # connecting with pw=**********
"%s" % secret # **********
secret # ********** (Jupyter cell last line)
# Logging is safe too
logger.info("got %s", secret) # got **********
# Exception messages don't leak
raise ValueError(secret) # ValueError: **********
# getAll() — every value is redacted
all_secrets = synutils.credentials.getAll("svc-account-a")
print(all_secrets)
# {'user': '**********', 'password': '**********', 'host': '**********'}

Scala

val secret = synutils.credentials.get("svc-account-a", "password") 

// All of these print **********
println(secret) // **********
secret.toString // **********
println(s"pw=$secret") // pw=**********
secret // ********** (REPL last expression)

// getAll() — every value is redacted
val all = synutils.credentials.getAll("svc-account-a")
println(all)

// Map(user -> **********, password -> **********, host -> **********)

Details

Unseal — getting the raw value for SDK calls

Use .get() (preferred) or .unseal() (legacy alias) on a single SecretString. For a whole SecretDict/SecretMap, use .unseal() to get a plain dict / Map[String, String].

Python — pass to SDKs

import boto3 
# Single value — call .get() or .unseal()
secret_key = synutils.credentials.get("aws_creds", "secret_access_key")
client = boto3.client("s3", aws_secret_access_key=secret_key.get())
# Whole bag — .unseal() returns a regular dict with raw strings
raw = synutils.credentials.getAll("aws_creds").unseal()
client = boto3.client(
"s3",
aws_access_key_id=raw["access_key"],
aws_secret_access_key=raw["secret_key"],
)

Scala — implicit conversion to String for JDBC / SDKs

import java.sql.DriverManager 

val pwd = synutils.credentials.get("db_creds", "password")

// JDBC — implicit conversion to String works directly
val conn = DriverManager.getConnection(jdbcUrl, "user", pwd)

// String interpolation — use type ascription to force the conversion
val urlWithPwd = s"jdbc:postgresql://host/${pwd: String}"

// Or explicit unseal / get
val raw: String = pwd.get() // preferred
val raw2: String = pwd.unseal() // legacy alias

// Whole bag — .unseal() / .getAll() returns Map[String, String]
val rawMap: Map[String, String] = synutils.credentials.getAll("db_creds").unseal()

Heads-up — string manipulation reveals the value. SecretString is a string subclass (Python) / has an implicit String conversion (Scala), so anything that touches the underlying characters bypasses redaction:

# Python — these all leak the raw value 
"prefix-" + secret # leaks
secret + "-suffix" # leaks
secret[:5] # leaks
secret.encode() # leaks
secret.upper() # leaks
// Scala — these force the implicit conversion and leak 
val s: String = secret // leaks (type ascription)
"prefix-" + secret // leaks (implicit conversion)

Rule of thumb: printing/logging is safe; string manipulation reveals the value. Prefer f-strings / s-interpolation over + concatenation when building log lines.

files — platform file registry

A file object in the platform pairs a base cloud-storage path with a list of files (the object's parameters). synutils.files resolves these to full cloud paths and — for DATA_FILE objects in supported formats — reads them directly into a Spark DataFrame.

Methods

MethodPurpose
get(name)Full file object dict / Map from the API (cached per name)
getPath(name)Full cloud paths for all files in this object — returns List[str] / List[String]
getMetadata(name)Curated subset of metadata with renamed keys
createDataFrame(name, fileName, sep=None, header=True, inferSchema=True)Read a registered file into a Spark DataFrame. DATA_FILE objects only; supported fileFormat: DELIMITED, JSON, PARQUET, ORC, AVRO
clearCache()Drop cached responses (force a re-fetch on next call)

createDataFrame parameters

ParameterTypeDefaultPurpose
namestr / StringFile object name registered in the platform
fileNamestr / StringSpecific file within the object — must match one of the entries in the object's parameters[].name list
sepstr / StringNone (Py) / null (Scala) — falls back to the object's API-configured delimiter, then ","Column separator. Used only for DELIMITED.
headerbool / BooleanTrueFirst row is a header. Used only for DELIMITED.
inferSchemabool / BooleanTrueInfer column types from data. Used only for DELIMITED.

Raises:

  • RuntimeError (Py) / SynUtilsException (Scala) if no SparkSession was supplied to init().
  • ValueError (Py) / IllegalArgumentException (Scala) if the object is not a DATA_FILE or its fileFormat is unsupported.

Examples

Python

# 1. Inspect a file object 
info = synutils.files.get("daily_report")
print(info["objectTypeKey"], info["fileFormat"])
# 2. Get full cloud paths for every file in the object
paths = synutils.files.getPath("daily_report")
# ['gs://my-bucket/reports/sales.csv', 'gs://my-bucket/reports/orders.csv']
# 3. Curated metadata subset
meta = synutils.files.getMetadata("daily_report")
# 4. Read one file as a DataFrame (uses the object's configured delimiter)
df = synutils.files.createDataFrame("daily_report", "sales.csv")
df.show(5)
# 5. Override CSV options for this read only
df = synutils.files.createDataFrame(
"daily_report",
"sales.tsv",
sep="\t",
header=False,
inferSchema=False, )
# 6. JSON / PARQUET / ORC / AVRO — sep / header / inferSchema are ignored
events = synutils.files.createDataFrame("event_dump", "events.json")
sales = synutils.files.createDataFrame("sales_dump", "2024-01.parquet")
orders = synutils.files.createDataFrame("orders_dump", "orders.orc")
records = synutils.files.createDataFrame("user_records", "users.avro")
# Note: AVRO requires the matching spark-avro JAR loaded into the runtime.
# 7. Drop cached responses (force a re-fetch on next call)
synutils.files.clearCache()

Scala

// 1. Inspect a file object val info = synutils.files.get("daily_report") 
println(s"${info("objectTypeKey")} ${info("fileFormat")}")

// 2. Full cloud paths for every file in the object
val paths: List[String] = synutils.files.getPath("daily_report")

// 3. Curated metadata subset
val meta = synutils.files.getMetadata("daily_report")
println(meta("type"), meta("fileFormat"))

// 4. Read one file as a DataFrame (uses the object's configured delimiter)
val df = synutils.files.createDataFrame("daily_report", "sales.csv")
df.show(5)

// 5. Override CSV options for this read only
val tsv = synutils.files.createDataFrame(
"daily_report",
"sales.tsv",
sep = "\t",
header = false,
inferSchema = false )

// 6. JSON / PARQUET / ORC / AVRO — sep / header / inferSchema are ignored
val events = synutils.files.createDataFrame("event_dump", "events.json")
val sales = synutils.files.createDataFrame("sales_dump", "2024-01.parquet")
val orders = synutils.files.createDataFrame("orders_dump", "orders.orc")
val records = synutils.files.createDataFrame("user_records", "users.avro")

// 7. Drop cached responses
synutils.files.clearCache()

Tip: getPath() returns paths for all files in the object — useful when you want Spark to read everything in one go via spark.read.csv(synutils.files.getPath("daily_report")). createDataFrame() reads exactly one file at a time, identified by fileName.

fs — direct object-store filesystem (S3 / GCS / Azure / HDFS)

The synutils.fs object auto-routes by URI scheme. Schemeless paths (bucket/key/path) are also accepted — the scheme is inferred from synutils.infrastructure.fileSystemPrefix.

Listing methods — output shape at a glance

Same shape across Python and Scala:

MethodWhat it returns
list(path, recursive=True)Files → basenames (last / segment); folder markers → empty string "". With recursive=False, files → basenames; subdirectories → bucket-relative prefix path (e.g. "syn-cluster-config/deps/jars/")
ls(path)Top-level entries as full URIs — files (s3://bucket/prefix/file.jar) + subdirectory prefixes (s3://bucket/prefix/sub/)
listRecursive(path)All files recursively as full URIs
list_file_paths(path) (alias for listRecursive)All files recursively as full URIs

Methods

MethodPurpose
ls(path)Top-level entries as full URIs
listRecursive(path)All files recursively as full URIs
list(path, recursive=True)Basenames (see table above)
list_file_paths(path)Alias for listRecursive (full URIs)
exists(path)True if file or folder-prefix exists
upload(local, remote)Upload single file
download(remote, local)Download single file
uploadFolder(localDir, remoteDir)Recursive upload
downloadFolder(remoteDir, localDir)Recursive download
copy(src, dest)Server-side copy
move(src, dest, exclude_folders=None, rewrite_subfolders=None) (Python) move(src, dest)(Scala)Move single file (single-object mode) or copy a directory tree with filtering (tree mode — see below)
rename(old, new)Rename in place (same bucket/container)
delete(path)Delete file or prefix
mkdir(path)Create directory marker
content(path)Read text content
read(path)Read raw bytes (Python: bytes; Scala: Array[Byte])
head(path, maxBytes=65536)Read first N bytes as text
writeText(path, content)Write a text file
stream(path)Lazy read stream for large files
uploadStream(fileObj, path)Upload from a file-like object. Returns the provider-native upload response (boto3 put_object dict, GCS Blob, Azure upload result)
PREFIX (property)Filesystem URI prefix (s3://, gs://, abfs://) inferred from infrastructure
getBaseDir()User's base workspace directory: /<bucket>/syn-workspace/workspaces/<workspace>
getLocalTempDir()Local temporary directory (/tmp or platform equivalent)

move(src, dest, exclude_folders=None, rewrite_subfolders=None) — full signature

Two modes, dispatched automatically:

  • Single-object move — when src does not end with / and no filter parameters are given. Performs copy + delete on the single object (true move semantic).
  • Tree mode (legacy copy-with-filter) — when src ends with / or either filter parameter is supplied. Lists every object under src, applies the filters, and copies each remaining object to the corresponding path under dest. Mirrors the legacy notebook S3FileSystem.move / GCSFileSystem.move semantics.

Heads-up: Tree mode does not delete the source. The legacy implementation was effectively a filtered tree-copy despite the move name — this implementation preserves that behaviour for backward compatibility. If you need a true tree-move, follow up with delete() on the source prefix.

ParameterPurpose
srcSource URI. Treat as a directory prefix when it ends with / or a filter parameter is supplied.
destDestination URI.
exclude_folders (Python Only)Folder names whose objects should be left in place (not copied) during a tree-mode call. Each entry is normalised to "<name>/" and substring-matched against the relative path — so exclude_folders=["logs"] excludes anything whose relative path contains "logs/".
rewrite_subfolders (Python Only){old: new} substring replacements applied to the destination path via str.replace.
# Single-object move (true move: copy + delete) 
synutils.fs.move("s3://bucket/a/file.csv", "s3://bucket/b/file.csv")
# Tree copy with exclusions (legacy behaviour — does NOT delete source)
synutils.fs.move(
"s3://bucket/src/", "s3://bucket/dest/",
exclude_folders=["logs", "temp"],
)
# Tree copy with subfolder rename
synutils.fs.move(
"s3://bucket/src/", "s3://bucket/dest/",
rewrite_subfolders={"old_dir": "new_dir"},
)

Legacy aliases (Python only)

These names still work — they delegate to their canonical counterpart. Use the canonical name in new code.

CanonicalLegacy name
upload(local, remote)put(local, remote)
delete(path)rm(path)
move(src, dest)mv(src, dest)
exists(path)exist(path)
exists(path)is_exists(path)
mkdir(path)create_folder(path)
uploadFolder(src, dest)upload_folder(src, dest)
downloadFolder(src, dest)download_folder(src, dest)
uploadStream(file, path)upload_stream(file, path)

Examples

Python

synutils.fs.upload("local.csv", "gs://my-bucket/remote.csv")
print(synutils.fs.exists("gs://my-bucket/remote.csv"))
# Listing — three shapes
synutils.fs.list("gs://my-bucket/data/") # basenames (legacy AWS shape)
synutils.fs.ls("gs://my-bucket/data/") # full URIs (top level)
synutils.fs.listRecursive("gs://my-bucket/data/") # full URIs (recursive)
# Read raw bytes
data: bytes = synutils.fs.read("gs://my-bucket/config.bin")
# Inspect environment-derived properties
print(synutils.fs.PREFIX) # "gs://"
print(synutils.fs.getBaseDir()) # "/my-bucket/syn-workspace/workspaces/my-ws"
print(synutils.fs.getLocalTempDir()) # "/tmp"
# upload_stream — capture the response (e.g. ETag / VersionId)
with open("/tmp/data.csv", "rb") as f:
resp = synutils.fs.upload_stream(f, "s3://my-bucket/data.csv")
print(resp["ETag"])

Scala

synutils.fs.upload("local.csv", "gs://my-bucket/remote.csv")
println(synutils.fs.exists("gs://my-bucket/remote.csv"))

// Listing — three shapes
synutils.fs.list("gs://my-bucket/data/").foreach(println) // basenames
synutils.fs.ls("gs://my-bucket/data/").foreach(println) // full URIs (top level)
synutils.fs.listRecursive("gs://my-bucket/data/").foreach(println) // full URIs (recursive)

// Read raw bytes
val data: Array[Byte] = synutils.fs.read("gs://my-bucket/config.bin")

// Inspect environment-derived properties
println(synutils.fs.PREFIX) // "gs://"
println(synutils.fs.getBaseDir()) // "/my-bucket/syn-workspace/workspaces/my-ws"
println(synutils.fs.getLocalTempDir()) // "/tmp"

spark — dataset → DataFrame + write helpers (requires Spark)

MethodPurpose
createDataFrame(datasetName, from_date=None, to_date=None)Read dataset into a DataFrame
isTableExists(dataset)True if Hive table exists
createTable(df, name, partitionedDateColumn="")Create Hive table from DataFrame
writeToEventStore(df, datasetName, ...)Write a DataFrame to an event store. Auto-creates the dataset if it doesn't exist.
writeDatasetToEventStore(df, datasetName)Convenience wrapper around writeToEventStore for a registered dataset (uses dataset defaults)
writeFileToEventStore(localPath, eventstorePath)Push a local file into the event store

Python

df = synutils.spark.createDataFrame("user_events") 
df.show(5)
synutils.spark.writeDatasetToEventStore(df, "user_events")

Scala

val df = synutils.spark.createDataFrame("user_events") 
df.show(5)
synutils.spark.writeDatasetToEventStore(df, "user_events")

Details

writeToEventStore — full signature

Writes a DataFrame to a Hive-managed event store table. If the dataset doesn't already exist in the platform's metadata, it is created on the fly using the supplied fileFormat. Partitioning, compression, and process-mode handling are derived from the dataset definition; this method only orchestrates the Spark-side write.

ParameterTypeDefaultPurpose
dfDataFrameSource DataFrame to write
datasetNameStringDataset name in database.tablename format
numPartitionsIntNone (Py) / 0 (Scala)If > 0 and the dataset is partitioned, adds DISTRIBUTE BY <partition_cols>, floor(rand()*numPartitions) to control output file count per partition
partitionedDateColumnStringNone (Py) / "" (Scala)Override the dataset's configured partition column. If set, the dataset is updated to partition by this column before writing
isOverwriteBooleanTrueTrueINSERT OVERWRITE TABLE (replaces partition data); FalseINSERT INTO TABLE (appends)
overrideProcessModeBooleanTrueWhen True, recreates / re-initializes the table even if it exists. Set False to preserve an existing table definition
fileFormatFileFormat / StringPARQUETUsed only when the dataset must be created (404 from the dataset API). Ignored if the dataset already exists. Supported: PARQUET, ORC, AVRO, DELTA, TEXTFILE

Python

from synutils.file_format
import FileFormat
# Minimal write — dataset must already exist
synutils.spark.writeToEventStore(df, "analytics.user_events")
# Append (don't overwrite existing partitions)
synutils.spark.writeToEventStore(df, "analytics.user_events", isOverwrite=False)
# Control output file count per partition (e.g. 8 files per partition)
synutils.spark.writeToEventStore(df, "analytics.user_events", numPartitions=8)
# Override the partition column for this write only
synutils.spark.writeToEventStore(df, "analytics.user_events", partitionedDateColumn="event_date")
# Auto-create as Avro if dataset doesn't exist yet
synutils.spark.writeToEventStore(df, "analytics.new_avro_dataset", fileFormat=FileFormat.AVRO)
# Preserve existing table definition (don't recreate)
synutils.spark.writeToEventStore(df, "analytics.user_events", overrideProcessMode=False)

Scala

Details

import com.syntasa.synutils.FileFormat 

// Minimal write
synutils.spark.writeToEventStore(df, "analytics.user_events")

// Append
synutils.spark.writeToEventStore(df, "analytics.user_events", isOverwrite = false)

// Control output file count per partition
synutils.spark.writeToEventStore(df, "analytics.user_events", numPartitions = 8)

// Override partition column for this write
synutils.spark.writeToEventStore(df, "analytics.user_events", partitionedDateColumn = "event_date")

// Auto-create as Avro
synutils.spark.writeToEventStore(df, "analytics.new_avro_dataset", fileFormat = FileFormat.AVRO.getValue())

// Preserve existing table definition
synutils.spark.writeToEventStore(df, "analytics.user_events", overrideProcessMode = false)

Heads-up: isOverwrite=True overwrites at the partition level for partitioned tables, not the entire table. For non-partitioned tables it overwrites the whole table.

lib — install packages and JARs

Python — installPyPI / installCondaPackage

MethodPurpose
installPyPI(packages, acrossAllNodes=True)pip install. Auto-downloads s3:///gs:///https:// URLs, plus .zip / .whl / .tar.gz filenames from the cluster's <bucket>/<config-folder>/deps/python/ folder. Multiple packages space-delimited.
installCondaPackage(packages, acrossAllNodes=True)conda install (channel conda-forge). Multiple packages space-delimited.

Set acrossAllNodes=False to install on the kernel only (skip distribution to Spark worker nodes).

# 1. Regular PyPI package synutils.lib.installPyPI("requests") 
# 2. Multiple packages synutils.lib.installPyPI("requests pandas>=2.0 numpy")
# 3. Local zip from the cluster's deps/python folder
# (resolves to <bucket>/<config-folder>/deps/python/simple_module.zip)
synutils.lib.installPyPI("simple_module.zip")
# 4. Full cloud path (s3://, gs://, or https://)
synutils.lib.installPyPI("s3://syntasa-k3s-dev/pradeepm/python_modules/simple_module.zip")
# 5. Mix everything in one call
synutils.lib.installPyPI("requests simple_module.zip s3://syntasa-k3s-dev/pradeepm/python_modules/other.whl")
# 6. Kernel-only install (don't distribute to Spark workers)
synutils.lib.installPyPI("requests", acrossAllNodes=False)
# 7. conda-forge package
synutils.lib.installCondaPackage("scipy")

Scala — installJars

Install JARs into the running Scala kernel.

Accepts three source styles, mixed freely in a single space-delimited call:

SourceExample
Maven coordinates"org.joda:joda-money:1.0.4"
Filename in the cluster's deps folder (<bucket>/<config-folder>/deps/jars/)"greeterWithDollar.jar"
Full cloud-storage path"gs://syn-400-development-kub/pradeepm/greeter.jar"
// Single source synutils.lib.installJars("org.joda:joda-money:1.0.4")
synutils.lib.installJars("greeterWithDollar.jar")
synutils.lib.installJars("gs://syn-400-development-kub/pradeepm/greeter.jar")

// Multiple, space-delimited
synutils.lib.installJars("org.joda:joda-money:1.0.4 greeterWithDollar.jar gs://syn-400-development-kub/pradeepm/greeter.jar")