Amazon S3 reverse ETL source

Send data from Amazon S3 to your entire stack.

Amazon S3 is a cloud-based object storage service that lets businesses securely store their data at scale.

Setting up the S3 source

  1. Log in to your RudderStack dashboard.
  2. From the left panel, go to Directory > Sources > Reverse ETL. Then, select Amazon S3.
  3. Assign a name and click Continue.

Connection credentials

Configure the following settings to authenticate RudderStack to access your S3 account:

  • Connection Mode: RudderStack provides the following options to connect to S3:
    • Cross-Account Role (recommended): This option lets you connect to S3 through an IAM access role. To do so, you need to first create an IAM role for RudderStack with the required permissions to access your S3 account. Refer to the Creating the RudderStack IAM Role for S3 section below for the detailed steps.
    • Access Key: This option lets you connect to S3 using your AWS access key ID and secret access key.
It is highly recommended to use the Cross-Account Role method for connecting to S3 as the Access Key method will be deprecated soon.
  • Account Name: Specify a name that will be used to identify the connection account.
  • Role ARN: If you select the Cross-Account Role (recommended) connection mode, specify the ARN after creating the RudderStack IAM role.
  • AWS Access Key ID: If you select the Access Key connection mode for authenticating RudderStack, specify your AWS access key ID. For more information on obtaining your access key ID and secret access key, refer to the FAQ section below.
  • AWS Secret Access Key: Enter the corresponding secret access key.

S3 permissions

The minimum S3 permissions that need to be attached to IAM role or the access keys (depending on your connection method) are listed below:

"Action": [

Schedule settings

Specify the Schedule Settings to schedule the data syncs from your S3 source.

RudderStack lets you schedule data syncs for your Reverse ETL sources and specify how and when the syncs will run. For more information on the Basic, CRON, and Manual schedule types, refer to the Sync Schedule guide.

Connecting to a destination

Once you successfully set up your S3 source, you can connect it to your preferred destination by clicking the Add Destination button:

Add destination in RudderStack

Specifying the data to import

While configuring the destination, specify the following bucket configuration settings needed for RudderStack to import the data and sync it to the connected destination:

  • S3 Bucket Name: Enter the name of the S3 bucket.
  • Prefix: Prefix refers to the path within your S3 bucket from where RudderStack will import the data. For example, if Prefix is set to RUDDER, then RudderStack will import the data stored in the location <your_s3_bucket>/RUDDER.
Bucket configuration settings
Your S3 bucket (with the prefix, if specified above) should only consist of Apache Parquet files as RudderStack can extract only the Parquet files. Also, the first row of the Parquet file should not have a null value (empty strings are allowed) for any column. It helps RudderStack to determine the correct schema of the file.
  • Choose user identifier: Choose a user identifier for user_id and/or anonymous_id from the dropdown.

Once you specify the above settings, you will be able to preview a snippet of your data, as shown below:

Data snippet preview

Here, you can select all or only specific columns of your choice, search the columns by a keyword, and also edit the JSON Trait Key. You can also preview the resulting JSON on the right.

Add constant option in RudderStack dashboard
As an alternative to JSON mapping, you can send the data to the destination using the Visual Data Mapper feature. However, this feature is currently supported only for selective destinations.

Updating an existing configuration

  1. Go to the Schema tab of your configured source and click Update.
  2. Update your column selection.
When updating the configuration, you can only change the existing mappings. The S3 Bucket Name, Prefix, and User Identifier fields are not editable.
  1. Finally, click the Save button.
After updating the configuration, the next sync will be a full sync.

Creating the RudderStack IAM role

Follow the steps in this section to create a RudderStack IAM role and obtain the role ARN.

Creating the policy

To create a managed policy defining the permissions for the RudderStack IAM role, follow these steps:

  1. Sign in to your AWS Management Console and open the IAM console.
  2. In the left navigation pane, click Policies followed by Create policy.
  3. In the JSON tab, paste the following policy:
  "Version": "2012-10-17",
  "Statement": [{
      "Effect": "Allow",
      "Action": "s3:ListAllMyBuckets",
      "Resource": "*"
      "Effect": "Allow",
      "Action": [
      "Resource": "*"
  1. Click Review policy. On the Review page, enter read-write-app-bucket.

Creating the IAM role

  1. In the left navigation pane, click Roles and go to Create role.
  2. Under Trusted entity type, select AWS account:
Setting up AWS IAM Role for RudderStack
  1. Select Another AWS account and under Account ID, enter 422074288268, the account ID associated with RudderStack.
  2. Under Options check Require external ID and enter your workspace ID as the External ID.
Setting up AWS IAM Role for RudderStack
  1. Review all settings carefully and click Next to proceed.
  2. In the Permissions window, select the check box next to the policy you created in the Creating the policy section above.
  3. Review all settings carefully and click Next to proceed.
  4. Enter a unique name for your role. Note that this name is case-insensitive. For example, you cannot create a role named RUDDERSTACK if rudderstack already exists.
You cannot edit the name of the role after it has been created.
  1. Optional: Enter the description for this role.
  2. Click Create role to complete the setup.
  3. Finally, copy the ARN of this newly created role and paste it in the Role ARN field in the dashboard settings.
Refer to the AWS IAM tutorial for more information on delegating access across AWS account using IAM roles.


Failing syncs containing large row groups

Note that the Reverse ETL syncs will fail if your files contain large row groups with sizes more than 512 MB. This is because S3 cannot process Parquet files with row groups larger than 512 MB.

Make sure that:

  • The maximum record length in the input or result is 1 MB.
  • The maximum uncompressed row group size is 512 MB.

See S3 documentation for more information on these limits.


Where can I obtain the AWS Access Key ID and the AWS Secret Access Key?

  1. Sign into your AWS Management Console as the root user.
  2. From the upper right corner, click your account and go to Security Credentials. You can find your access key ID listed here. You can also create a new access key by clicking the Create access key button:
AWS security

For more information on these AWS credentials, refer to the AWS documentation.

For setting up the S3 source, some S3 actions must be attached to your access keys. For more information on these actions, refer to the S3 permissions section above.

Questions? Contact us by email or on Slack