Python Sample Codes for Spark Processor
📄️ How to Copy a CSV File from Google Cloud Storage to Amazon S3 Using Python
As organizations move toward hybrid and multi-cloud architectures, it’s increasingly common to work with data spread across multiple cloud providers. Two popular services in this domain are Google Cloud Storage (GCS) and Amazon S3. Sometimes, you may need to move data between them — for instance, to centralize analytics, archive logs, or trigger pipelines hosted on a different platform.
📄️ Reading a CSV file from GCS bucket on AWS
In this article, we will walk through the process of reading a CSV file stored in a Google Cloud Storage (GCS) bucket from an application running in an AWS environment using PySpark. This is particularly useful in multi-cloud architectures where data may reside in GCS while processing is done on AWS infrastructure.
📄️ Creating a Simple Propensity Model in Syntasa App
How to Run Code in a Syntasa App {#h\_01JTFWYGCMTX3NJ0P6GTXQR0HV}
📄️ Reading CSV files from S3 bucket (Instance profile Disabled)
In the previous article, we explored how to read a CSV file from an S3 bucket located within the same AWS environment, using an instance profile for authentication. In some cases, the S3 bucket you want to access is not hosted within the same AWS environment as your Syntasa application. This typically happens when the S3 bucket belongs to another account or region, or when you're working across environments.
📄️ Reading CSV File from S3 Bucket (Instance Profile Enabled)
In Syntasa, Spark processors can be configured to access S3 buckets without hardcoding credentials by leveraging instance profiles. Before running this Spark code within the Syntasa application, it’s common to first test and validate the logic using Syntasa/JupyterLab notebook. If you’d like to first explore or run the notebook version of this process, you can refer to the following guide: Reading & Writing a CSV File
📄️ Reading Data from Snowflake with Notebooks
Introduction {#h\_01JSYEDB0C9AM9TVA0BY5K52KC}
📄️ Writing Data to Snowflake with Spark Processor
This guide explains how to extract data using Python Spark (PySpark) SQL, transform the data, and write it to a Snowflake table.
📄️ Reading data from Snowflake with Spark Processor
This guide explains how to read data using Python Spark (PySpark) SQL and write Hive table.