Hi All,
I’m somewhat new to the world of big data, so I apologize if this is a basic question. For context, I’m working on building a data-processing pipeline to transcribe audio files and identify the speakers in the file (using Whisper and Pyannot). This is a pretty compute-intensive task, which is why I’m looking into using Lambda.
Now for the dumb question:
I’m wondering if it’s possible to operationalize Lambda so that I can use it as part of a larger data processing pipeline. So far, I’ve just played around with running code on Jupyter notebook. Ideally, I’d like to set something up where whenever I push data to a Google Cloud Storage bucket, it will then somehow run the code on Lambda. Perhaps I just need to ssh into my instance and then run the code that way?
If anyone has any feedback regarding 1) If this is possible and 2) resources to help me accomplish it, that would be great!
Thanks!