fix(storage): remove partial file on any download_to_filename error#16727
fix(storage): remove partial file on any download_to_filename error#16727RyanCunhaCosta wants to merge 1 commit intogoogleapis:mainfrom
Conversation
Blob.download_to_filename() opens the destination file for writing before the network request begins. If the download fails, the method raises an exception but previously left an empty or partially-written file on disk for most error types (only DataCorruption and NotFound were cleaned up). This extends the cleanup to catch all exceptions, ensuring the partially-written file is removed before re-raising. This prevents callers from seeing a corrupt/empty file after network errors, timeouts, or any other unexpected failures. The inner try/except OSError is a safety guard for the edge case where the file was never successfully created.
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
There was a problem hiding this comment.
Code Review
This pull request introduces automatic cleanup for destination files when a download fails in download_to_filename, extending the previous error handling to cover all exceptions such as network errors and timeouts. It also adds comprehensive unit tests to verify cleanup behavior during various failure modes. Feedback was provided regarding the placement of the exception handler: nesting it within the with open(...) block prevents file deletion on Windows due to file locking. It is recommended to wrap the entire context manager in the try...except block and include logging for the caught exception to improve debuggability.
Problem
Blob.download_to_filename()opens the destination file for writing before the network request begins. If the download fails (network error, timeout,ConnectionError, etc.) the method raises an exception but leaves an empty or partially-written file on disk.The existing code only cleaned up on
DataCorruptionandNotFound, but every other failure mode — including commonrequests.exceptions.ConnectionError/Timeoutseen with GCS timeouts — still left a corrupt artifact.Observed in the wild:
After the above error the destination file exists on disk with 0 bytes, causing subsequent pipeline runs to treat the file as already downloaded and skip the retry.
Fix
In
_handle_filename_and_download(), replaceexcept (DataCorruption, NotFound)withexcept Exceptionso the file is removed before re-raising, regardless of the failure type. The innertry/except OSErroris a safety guard for the edge case where the file was never successfully created.Tests
Four new unit tests added to
tests/unit/test_blob.py:test_download_to_filename_cleans_up_on_connection_errorConnectionError→ file removedtest_download_to_filename_cleans_up_on_timeoutTimeout→ file removedtest_download_to_filename_cleans_up_on_generic_exceptionRuntimeError→ file removedtest_download_to_filename_keeps_file_on_successExisting tests (
test_download_to_filename_corrupted,test_download_to_filename_notfound) pass unchanged.Notes
os.path.exists()after aDataCorruption/NotFounderror would already receiveFalse; this extends that guarantee to all failure modes.