Amazon S3 file upload with cURL and Java code

Last Updated on by

Post summary: Working Java code to upload a file to Amazon S3 bucket with cURL.

This post gives a solution to a very rare use case where you want to use cURL from Java code to upload a file to Amazon S3 bucket.

Amazon S3

Amazon Simple Storage Service is storage for the Internet. It is designed to make web-scale computing easier for developers. Amazon S3 has a simple web services interface that you can use to store and retrieve any amount of data, at any time, from anywhere on the web.

Upload to Amazon S3 with Java

Amazon S3 has very good documentation how to use their Java client in order to upload or retrieve files. Ways described in the manual are:

  • Upload in a single operation – few lines of code to instantiate AmazonS3Client object and to upload the file in one chunk.
  • Upload using multipart upload API – provides the ability to upload a file into several chinks. It is possible to use low and high-level operations on the upload.
  • Upload using pre-signed URLs – with this approach you can upload to some else’s bucket with having access key and shared key.

More on each of the approaches can be found in Amazon S3 upload object manual.

Upload with cURL

cURL is widely used and powerful tool for data transfer. It supports various protocols and functionalities. In Windows world, it is not widely used, but there are cURL implementations for Windows which can be used. Uploading to Amazon S3 bucket with cURL is pretty easy. There are bash scripts that can do it. One such is described in File Upload on Amazon S3 server using CURL request post. It is the basis I took to create the Java code for upload.

Upload with cURL and Java

Upload with Java and cURL is a pretty rare case. Benefits of using this approach are memory and CPU optimization. If an upload is done through Java code file to be uploaded is read and stored in heap. This reading is optimized and parts already uploaded part is removed from the heap. Anyway reading and removing the not needed file parts requires memory to keep it and CPU for garbage collection, especially when a huge amount of data is to be transferred. In some cases where resources and performance are absolutely important this memory and CPU usage can be critical. cURL also uses memory to upload the file, but this becomes no longer problem of the JVM, rather than a problem of the OS. Upload Java code is:

import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.nio.file.Path;
import java.security.InvalidKeyException;
import java.security.NoSuchAlgorithmException;
import java.time.ZonedDateTime;
import java.time.format.DateTimeFormatter;
import java.util.Base64;

import javax.crypto.Mac;
import javax.crypto.spec.SecretKeySpec;

import org.apache.commons.io.IOUtils;

public class AmazonS3CurlUploader {

	private static final String ALGORITHM = "HmacSHA1";
	private static final String CONTENT_TYPE = "application/octet-stream";
	private static final String ENCODING = "UTF8";

	public boolean upload(Path localFile, String s3Bucket, String s3FileName,
			String s3AccessKey, String s3SecretKey) {
		boolean result;
		try {
			Process cURL = createCurlProcess(localFile, s3Bucket,
					s3FileName, s3AccessKey, s3SecretKey);
			cURL.waitFor();
			String response = IOUtils.toString(cURL.getInputStream())
					+ IOUtils.toString(cURL.getErrorStream());
			result = response.contains("HTTP/1.1 200 OK");
		} catch (IOException | InterruptedException e) {
			// Exception handling goes here!
			result = false;
		}
		return result;
	}

	private Process createCurlProcess(Path file, String bucket, String fileName,
			String accessKey, String secretKey) throws IOException {
		String dateFormat = ZonedDateTime.now()
				.format(DateTimeFormatter.RFC_1123_DATE_TIME);
		String relativePath = "/" + bucket + "/" + fileName;
		String stringToSign = "PUT\n\n" + CONTENT_TYPE + "\n"
				+ dateFormat + "\n" + relativePath;
		String signature = Base64.getEncoder()
				.encodeToString(hmacSHA1(stringToSign, secretKey));

		return new ProcessBuilder(
				"curl", "-X", "PUT",
				"-T", file.toString(),
				"-H", "Host: " + bucket + ".s3.amazonaws.com",
				"-H", "Date: " + dateFormat,
				"-H", "Content-Type: " + CONTENT_TYPE,
				"-H", "Authorization: AWS " + accessKey + ":" + signature,
				"http://" + bucket + ".s3.amazonaws.com/" + fileName)
				.start();
	}

	private byte[] hmacSHA1(String data, String key) {
		try {
			Mac mac = Mac.getInstance(ALGORITHM);
			mac.init(new SecretKeySpec(key.getBytes(ENCODING), ALGORITHM));
			return mac.doFinal(data.getBytes(ENCODING));
		} catch (NoSuchAlgorithmException | InvalidKeyException
				| UnsupportedEncodingException e) {
			return new byte[] {};
		}
	}
}

Conclusion

Upload to Amazon S3 with cURL from Java code is a rare case, which could be beneficial in the case where memory and CPU usage by JVM is crucial. Delegating file upload to cURL does not disturb JVM heap and Garbage Collection process.

Read more...