Recommended ML algorithm for mapping files to tests

Hey,
I’m searching the best algorithms can help me mapping 10,000 files (code files), on some 40 repositories, for 3 different projects, to 7,000 tests for detecting for each file/repository/project what are the most failing tests it usually failed on, and for each repository/project so I’ll be able to minimize the number of tests will be running (on a daily bases) for updates to the current files/new files were just added to some repository/project.

What kind of algorithms would be suitable for this? Classification? Clustering?

some preffared algorithm?

Thanks :slight_smile: