aws handling of s3:// URls does not seem to match guidance
Describe your issue
Unable to use awss3sink
with AWS S3 when either specifying the the region
, bucket
, and key
properties or altogether withuri
property is set to s3://bucket/key
. The element will not go to the PLAYING
state, logging the message: Could not open resource for writing.
Upon investigation, the s3url.rs
parser expects the format to be s3://region/bucket/key/version
, but what AWS documentation I could find on the matter suggested that the formats supported are (per the aws s3 help
CLI command args) are:
- Location:
s3://bucket[/prefix]/key
- Access Point:
s3://arn:aws:region:id:accesspoint/bucket[/prefix]/key
This is also the format shown in the AWS S3 web console for objects.
I could find no supporting documentation for the existing implementation, however there are several examples related to the virtual vs. deprecated path method -- all of which were referring to use of the HTTP/s protocol (not S3) for accessing objects. Moreover, the virtual vs. path change-over does not seem to apply to s3:// URLs (or at least, that protocol is not mentioned in those documents).
I went to try the examples from the element's readme, but they seem to be copy-pasted from the other s3sink
element in a different plugin. (The examples do not work when substituting awss3sink
accordingly; the same error happens.)
NOTE: As pointed out below, my initial failure with the examples came from environment variable misconfiguration, it seems that
AWS_ENDPOINT_URL
is being read in somewhere, perhaps when creating the client's config object. Because of this, when I thought I was testing examplegst-launch
commands against my AWS buckets it was actually trying to reach the address in that URL (minIO), which failed because it's running withMINIO_DOMAIN=localhost
, which is not an FQDN.
Expected Behavior
The examples in the readme should work once the reader properly configures their AWS access.
The root cause of going down this rabbit hole was that I was trying to use the awss3sink
with AWS S3 formatted s3://
URIs which did not work because it was parsing out my bucket name as the region even though I was setting the region=
explicitly. The property says it's the S3 object URL, so my expectation was that the same URL I see in the web console should work here, and it does not.
Observed Behavior
GstLaunch would report that the element could not reach the PLAYING
state. Enabling logging, one can observe the aforementioned message that the element could not access the specified destination.
Setup
- Operating System: Ubuntu 22.04
- Device: Computer
- gst-plugins-rs Version: 0.12.4
- GStreamer Version: 1.24.2
- Command line: bash 5.1.6
Steps to reproduce the bug
Run the examples with an AWS S3 conformant object URL. There seems to be no way to configure a different region.
gst-launch-1.0 \
videotestsrc !\
theoraenc !\
oggmux !\
awss3sink uri=s3://<your bucket>/file.ogv sync=false
Removed example where region, bucket, and key were specified explicitly. This does seem to work so long as you do not have
AWS_ENDPOINT_URL
exported into the environment, pointed at somewhere other than AWS that does not support virtual host paths. There is an unexpected environment sensitivity here since the behavior ofawss3sink
changed without setting the appropriateendpoint-uri
property.
How reproducible is the bug?
Always.
Screenshots if relevant
Solutions you have tried
The below "notes" were the original solutions tried. As noted in the above edits, much of the trouble was caused first by how the element treats uri
and region
properties as well as how it seems to be sensitive to AWS_ENDPOINT_URL
being set in the environment.
I have tried running a minio Docker container locally, configuring it with my AWS access key information, setting its region to
us-west-2
, and adding a bucket. When repeating the above examples using the additional propertyendpoint-url=http://localhost:9000
, I get the same failure. Alternatively, if I use the AWS CLI to try uploading to my S3 instance or minio, I have to specify the region separate from the s3 URL. For the minio example:
aws --endpoint-url http://localhost:9000 --region us-west-2 s3 cp <file name> s3://bucket/<file name>
In other words, both minio and AWS S3 handles3://bucket/key
, and both rejects3://region/bucket/key
.
Somewhat incorrect here about minio's behavior. minio was rejecting the calls because the internals of the SDK default to using virtual host paths (the alternative has been deprecated for years), and minio only supports virtual host paths if MINIO_DOMAIN
is set to an FQDN, which I did not have in this case (running localhost). As for why AWS S3 CLI rejects having region
in the host field is because that's not what an AWS S3 object URL (s3://
) is expected to contain. The only relationship between this element's uri
property and that the element strictly enforces one to prefix with the s3 protocol and then further enforces a schema that is not conformant to what AWS S3 would show in the CLI, web console, etc.
From my perspective then the 'fix' is that if the uri
is supposed to be the AWS S3 Object URI (s3://...
), then the parsing should match the two noted S3Uri
formats described in the CLI help (aws s3 help
), as it's the only reference I could find on the matter other than the AWS web console. Such a change would further require independent handling of the region
property when uri
is not set to an "access point" ARN. There probably should also be some investigation into what other AWS_...
environment variables are also being applied to the client configuration separately from what is specified in the element's properties (e.g., the endpoint URL).