Operationalizing Lamda

Hi All,

I’m somewhat new to the world of big data, so I apologize if this is a basic question. For context, I’m working on building a data-processing pipeline to transcribe audio files and identify the speakers in the file (using Whisper and Pyannot). This is a pretty compute-intensive task, which is why I’m looking into using Lambda.

Now for the dumb question:

I’m wondering if it’s possible to operationalize Lambda so that I can use it as part of a larger data processing pipeline. So far, I’ve just played around with running code on Jupyter notebook. Ideally, I’d like to set something up where whenever I push data to a Google Cloud Storage bucket, it will then somehow run the code on Lambda. Perhaps I just need to ssh into my instance and then run the code that way?

If anyone has any feedback regarding 1) If this is possible and 2) resources to help me accomplish it, that would be great!

Thanks!

I think I answered my own question. And learned a lot while doing it!

Essentially, I just need to SSH into the instance. Copy over my code. Install dependencies. And run the data processing.

Not sure how to totally automate this though. If anyone has suggestions, please let me know!