Basic idea of Convolution Neutral Network

I want to ask a basic understanding of CNN.

Let say I have 1 dataset (100 pictures) with

Class A (Picture of Cat: 40 pictures)
Class B (Picture of Dog: 60 pictures)
And then, I input 100 pictures into CNN and run it.

My question is:

What is the output should I look at?
Is that mean if I input a picture (either Cat and Dog), I can know the picture (is cat or is a dog) by looking at the output?
Thank you.