Skip to main content

Service Account: AWS IAM Policy Reference

For platform administrators configuring Service Accounts on the Syntasa platform.

Companion pages:

When a Service Account (SA) is attached to a Notebook Workspace or Runtime Template, Syntasa uses its credentials for customer-owned data and AWS services, and for Spark event history written under the Syntasa system bucket. Other Syntasa-managed infrastructure (JAR/config staging, workspace metadata, log uploads, etc.) is handled by the cluster's own IAM role and does not require permissions on the SA.

This document defines the IAM policies you should attach to your SA's IAM User or IAM Role.

Layered access control - read this first

The SA needs IAM permission to call AWS services (S3, Glue, etc.) — without these, AWS rejects the calls before any Syntasa code runs. So the SA must have:

  • S3 access to the customer's data bucket
  • S3 access to syn-spark-history/ on the Syntasa system bucket (Spark drivers write event logs here)
  • Glue Catalog read access for any Spark SQL (SHOW DATABASES, SELECT FROM …) to work
  • Glue Catalog write access if users create or alter tables

On top of that, Syntasa Authorization controls which databases and tables each individual user can actually see and modify. Syntasa Authz is the user-level access control; the IAM policy is the identity-level access control for the SA itself.

Because user-level filtering is handled by Syntasa Authz, the recommended Glue scope is full catalog (*) — broad at the AWS layer, fine-grained at the Syntasa layer. If your security policy requires belt-and-suspenders, you can additionally scope the Glue Resource ARNs to specific databases at the AWS layer (see Scenario C).

Quick reference

If your users will…Include this blockRequired?
Read or write S3 data in the customer bucket(1) Customer Data S3Required
Run any Spark job with event logging enabled (default for all notebooks and batch runs)(2) Spark Event History S3Required
Run Spark SQL (SHOW DATABASES, SELECT FROM …) — i.e. every notebook user(3) Glue Catalog — ReadRequired
Create / alter Glue tables and partitions(4) Glue Catalog — WriteRequired if users create / alter tables
Run Athena queries directly from notebooks(5) Athena QueryRequired if users use Athena

Identity policy template

Replace placeholders with your actual values:

  • <CUSTOMER_DATA_BUCKET> — the S3 bucket holding customer data
  • <SYNTASA_SYSTEM_BUCKET> — the Syntasa-owned system bucket (the value of KERNEL_SYN_BUCKET / NOTEBOOK_SERVICE_STORAGE_BUCKET; ask your Syntasa platform team if unsure)
  • <REGION> — AWS region (e.g. us-east-1)
  • <ACCOUNT_ID> — your AWS account ID
  • <WORKGROUP_NAME> — Athena workgroup (only block 5)

Glue Resource ARNs default to * (full catalog). Syntasa Authorization handles which databases / tables each user can actually see. If you want IAM-level scoping in addition, replace * with specific database names (see Scenario C).

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "CustomerDataS3",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:ListBucket",
"s3:ListBucketMultipartUploads",
"s3:GetBucketLocation",
"s3:AbortMultipartUpload",
"s3:ListMultipartUploadParts"
],
"Resource": [
"arn:aws:s3:::<CUSTOMER_DATA_BUCKET>",
"arn:aws:s3:::<CUSTOMER_DATA_BUCKET>/*"
]
},
{
"Sid": "SparkEventHistoryS3List",
"Effect": "Allow",
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::<SYNTASA_SYSTEM_BUCKET>",
"Condition": {
"StringLike": {
"s3:prefix": [
"syn-spark-history",
"syn-spark-history/*"
]
}
}
},
{
"Sid": "SparkEventHistoryS3Objects",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:AbortMultipartUpload",
"s3:ListMultipartUploadParts"
],
"Resource": "arn:aws:s3:::<SYNTASA_SYSTEM_BUCKET>/syn-spark-history/*"
},
{
"Sid": "GlueCatalogRead",
"Effect": "Allow",
"Action": [
"glue:GetDatabase",
"glue:GetDatabases",
"glue:GetTable",
"glue:GetTables",
"glue:GetTableVersion",
"glue:GetTableVersions",
"glue:GetPartition",
"glue:GetPartitions",
"glue:BatchGetPartition",
"glue:GetUserDefinedFunction",
"glue:GetUserDefinedFunctions",
"glue:SearchTables"
],
"Resource": [
"arn:aws:glue:<REGION>:<ACCOUNT_ID>:catalog",
"arn:aws:glue:<REGION>:<ACCOUNT_ID>:database/*",
"arn:aws:glue:<REGION>:<ACCOUNT_ID>:table/*/*",
"arn:aws:glue:<REGION>:<ACCOUNT_ID>:userDefinedFunction/*/*"
]
},
{
"Sid": "GlueCatalogWrite",
"Effect": "Allow",
"Action": [
"glue:CreateDatabase",
"glue:UpdateDatabase",
"glue:DeleteDatabase",
"glue:CreateTable",
"glue:UpdateTable",
"glue:DeleteTable",
"glue:BatchDeleteTable",
"glue:CreatePartition",
"glue:UpdatePartition",
"glue:DeletePartition",
"glue:BatchCreatePartition",
"glue:BatchUpdatePartition",
"glue:BatchDeletePartition",
"glue:CreateUserDefinedFunction",
"glue:UpdateUserDefinedFunction",
"glue:DeleteUserDefinedFunction"
],
"Resource": [
"arn:aws:glue:<REGION>:<ACCOUNT_ID>:catalog",
"arn:aws:glue:<REGION>:<ACCOUNT_ID>:database/*",
"arn:aws:glue:<REGION>:<ACCOUNT_ID>:table/*/*",
"arn:aws:glue:<REGION>:<ACCOUNT_ID>:userDefinedFunction/*/*"
]
},
{
"Sid": "AthenaQuery",
"Effect": "Allow",
"Action": [
"athena:StartQueryExecution",
"athena:GetQueryExecution",
"athena:GetQueryResults",
"athena:StopQueryExecution",
"athena:ListQueryExecutions",
"athena:GetWorkGroup"
],
"Resource": [
"arn:aws:athena:<REGION>:<ACCOUNT_ID>:workgroup/<WORKGROUP_NAME>"
]
}
]
}

Remove GlueCatalogWrite if users only read tables. Remove AthenaQuery if users don't run Athena queries directly. The two SparkEventHistory blocks may be omitted only if Spark event logging is disabled platform-wide at the runtime template level.

Why the SA needs write access to syn-spark-history/: when a Spark job is launched from a notebook or batch runtime with an SA attached, the driver writes its event log to s3://<SYNTASA_SYSTEM_BUCKET>/syn-spark-history/ using the SA's credentials. Without this block the driver fails at SparkContext init with AccessDenied on PutObject. The path is fixed; only the bucket name varies per deployment.

Trust policy (for IAM Role SAs only)

If your SA is an IAM Role (recommended over IAM User — see the User Guide), attach this trust policy to the role:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<SYNTASA_ACCOUNT_ID>:role/<SYNTASA_INFRA_ROLE>"
},
"Action": "sts:AssumeRole"
}
]
}

Replace <SYNTASA_ACCOUNT_ID> and <SYNTASA_INFRA_ROLE> with the values your Syntasa platform team provides.

Also required on the role:

  • MaxSessionDuration43200 seconds (12 hours)
  aws iam update-role --role-name <YourRoleName> --max-session-duration 43200

Optional (recommended for cross-account): require an external ID for the AssumeRole call so it cannot be triggered without Syntasa-side coordination. Ask your Syntasa platform team for the external ID value, then add a condition:

"Condition": {
"StringEquals": {
"sts:ExternalId": "<EXTERNAL_ID>"
}
}

What the SA does NOT need

The following permissions are not required on the SA:

PermissionWhy not
s3:* on Syntasa-owned bucket prefixes other than syn-spark-history/ (e.g. syntasa-config/, syntasa-logs/, syn-workspace/, syn-cluster-logs/, syn-file-uploads/, syn-volumes/)The cluster's own IAM role handles all Syntasa-internal storage except Spark event history. Granting these to the SA is harmless but unnecessary.
emr:*, eks:*Cluster control-plane operations run as the platform's IAM role.
sts:AssumeRole on the SA itselfThe Syntasa infra role assumes your SA, not the other way around.
iam:*Never required.

Common scenarios

Scenario A — Read-only notebook user (recommended default)

User runs Spark SQL against existing Glue tables; does not create or alter tables.

Include: (1) Customer Data S3 + (2) Spark Event History S3 + (3) Glue Catalog Read. Drop (4) Glue Catalog Write since the user doesn't need it.

Scenario B — Read / write notebook user

User reads from one bucket, writes results to another, and creates / alters Glue tables.

Include: (1) Customer Data S3 — list all relevant bucket ARNs in Resource, (2) Spark Event History S3, (3) Glue Catalog Read, (4) Glue Catalog Write.

Scenario C — Defense in depth (database-scoped Glue)

Same as A or B above, but your security policy requires AWS-layer scoping in addition to Syntasa Authz user-level filtering.

Replace the database/* and table/*/* parts of Glue Resource ARNs with the specific database name(s) — e.g. database/marketing and table/marketing/*. Add a separate statement per database if you have several.

Scenario D — Athena power user

User runs Athena queries directly (not via Spark) from notebooks.

Include: (1), (2), (3), optionally (4), plus (5) Athena Query. Athena writes results to a workgroup-configured output location — make sure (1) covers that path too.

Scenario E — Cross-account data bucket

Customer data lives in a different AWS account than the SA.

Cross-account S3 access requires permission on both sides:

  1. SA identity policy — the same (1) Customer Data S3 block, with the Resource ARN using the data account's bucket name (the account ID is implicit in the ARN).
  2. Bucket policy on the data bucket — must explicitly Allow the SA principal. Add this to the data bucket's bucket policy:
   {
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<SA_ACCOUNT_ID>:role/<SA_ROLE_NAME>"
},
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::<CUSTOMER_DATA_BUCKET>",
"arn:aws:s3:::<CUSTOMER_DATA_BUCKET>/*"
]
}

For IAM User SAs, use arn:aws:iam::<SA_ACCOUNT_ID>:user/<SA_USERNAME> as the principal.

Validation checklist

After attaching the policy, verify the SA works end-to-end:

# In a Syntasa notebook attached to this SA:
spark.sql("SHOW DATABASES").show() # Syntasa Authz returns user-visible databases
spark.read.parquet("s3://<CUSTOMER_BUCKET>/some-path/").show() # exercises CustomerDataS3
spark.sql("CREATE TABLE my_db.test AS SELECT 1").collect() # write through Syntasa Authz to user's DB
# Then check the Spark History Server UI for the run — confirms SparkEventHistoryS3 worked.

If a test fails:

ErrorLikely cause
AccessDenied … s3:ListBucket on the customer bucketMissing or wrong bucket ARN in block (1)
AccessDenied … s3:PutObject on arn:aws:s3:::<SYNTASA_SYSTEM_BUCKET>/syn-spark-history/* (or driver fails at SparkContext init)Block (2) missing — the SA can't write Spark event logs. Add the two SparkEventHistory statements, or disable Spark event logging at the runtime template level.
AccessDenied … s3:ListBucket on arn:aws:s3:::syntasa-* for any prefix other than syn-spark-history/This should not happen — the cluster identity owns the other Syntasa prefixes. Contact Syntasa support if you see this.
AccessDenied … glue:GetDatabaseBlock (3) missing, or its Resource ARN doesn't cover the database being queried (only relevant if you scoped it down per Scenario C — widen the ARN or add the missing database)
AccessDenied … glue:CreateTable (or similar write action)Block (4) missing — add it if users create / alter tables
DurationSeconds exceeds MaxSessionDuration (IAM Role only)Set role's MaxSessionDuration ≥ 43200
AssumeRole … is not authorizedTrust policy missing the Syntasa infra role principal

Note on user-level access: the IAM policy controls what the SA can call. Which databases and tables a user can actually see and modify is governed by Syntasa Authorization, which filters on top of the SA's IAM-level access. Granting the SA database/* in Glue does not expose all databases to all users.