
Assuming you know what a probability density is, the naive way to estimate this is using a histogram. That is, you split the space into equally sized bins, then you count the number found in each bin, and the density estimate is proportional to this count (normalized so that the integral is 1).
Next you might think it's a little weird that this estimate is piecewise constant. If a point as the edge of a bin, then it might seem likely that its density is closer to a point just in the next bin rather than the points all the way on the other side of its own bin.
The natural way to solve this is to estimate the density at each point using a bin centered at that point (and normalizing as appropriate). This is called a box-car kernel, but you should think of it just as a moving average.
Other kernels are just a generalization of this moving average concept. Usually instead of weighting all points in the bin equally, you want points closer to the desired estimate to have a higher weight.
There are two things to consider when choosing a kernel: shape and bandwidth. In terms of shape, some are rounder, some are more triangular, and some (like the box-car kernel) are more flat.
Bandwidth describes how fast the weights fall off. If you're just using flat bins, you can just think of this as choosing how wide the bins are. In practice, it turns out that bandwidth is actually a lot more important than kernel shape.