Operationalizing Lamda

beige-coffee · November 11, 2022, 1:18am

Hi All,

I’m somewhat new to the world of big data, so I apologize if this is a basic question. For context, I’m working on building a data-processing pipeline to transcribe audio files and identify the speakers in the file (using Whisper and Pyannot). This is a pretty compute-intensive task, which is why I’m looking into using Lambda.

Now for the dumb question:

I’m wondering if it’s possible to operationalize Lambda so that I can use it as part of a larger data processing pipeline. So far, I’ve just played around with running code on Jupyter notebook. Ideally, I’d like to set something up where whenever I push data to a Google Cloud Storage bucket, it will then somehow run the code on Lambda. Perhaps I just need to ssh into my instance and then run the code that way?

If anyone has any feedback regarding 1) If this is possible and 2) resources to help me accomplish it, that would be great!

Thanks!

beige-coffee · November 11, 2022, 2:29am

I think I answered my own question. And learned a lot while doing it!

Essentially, I just need to SSH into the instance. Copy over my code. Install dependencies. And run the data processing.

Not sure how to totally automate this though. If anyone has suggestions, please let me know!

Topic		Replies	Views
Training jobs using script	0	43	July 29, 2024
PyLambdaCloud: simple way to launch tasks from Python	1	854	June 26, 2023
No direct startup script support? Technical Help	2	110	June 15, 2025
Best Practices(?): common tactics for server setups & data storage Deep Learning: Getting Started	2	1753	May 24, 2023
Uploading large amounts of data Technical Help	3	1186	January 29, 2025

Operationalizing Lamda

Related topics