Seqram is largely correct, but for simplicity has restricted the discussion to English examples which are clearly diphthongs in both
spelling and
pronunciation. I would like to make some more general points, with examples from other languages.
A diphthong is a sequence of vowels, but not every sequence of vowels is a diphthong. In Japanese any sequence is permitted: ie ia uo ae oi etc., and each takes as long to say as any other two-syllable sequence like ike ane ori etc. So these sequences are not diphthongs. The key is that a diphthong is a single phoneme, that is the speakers feel it is a single sound, and makes a single syllable: as in bit bat beet bite boot bout, we don't notice in speech that some have one vowel, others two: all are a single syllable.
We should dispose of the idea that a sequence of letters is a diphthong. This is more correctly called a digraph: the vowel in beet and beat is a single long vowel, not a diphthong. Webster 1913 notes this, but shows its age in using rain as an example. For Scots, that is a pure long vowel; most other English speakers these days have a diphthong, as Seqram also notes.
Now to notation. Most languages, if they have diphthongs at all (about a third of them do), include ones of the type /ai/ and /au/, as in high and how; that is where a low vowel /a/ rises to a higher position near the roof of the mouth. The "rising" here is a literal raising of the tongue. The endpoint is either the vowel [I] as in bit, or [i] as in beet, which is slightly higher, but the two are phonetically close together.
They are also close to the glide consonant of you, yes, which in phonetic notation is written [j] (though [y] is also used, as in Seqram's expositions). So the precise sequences [aI] and [ai] and [aj] sound quite similar. A particular language or dialect may use one or the other, and it may depend on position too.
For example, in my own speech I use [aI] before a consonant and [aj] finally or before a vowel. But in the more conservative RP style of British speech, it is always [aI], and I may use that if I am speaking more carefully.
Somewhat lower than [I] is [e], and Latin had only [ae]. Welsh has both [ae] and [ai] and they can be used contrastively.
Anything I've said about /ai/ vs /aj/ applies equally well to /au/ vs /aw/. (Where I use square brackets I mean a precise shade of sound; slants enclose a broader approximation.)
To some extent this is just notational. In American phonetic writing, /ay/ and /aw/ are almost always used. It is less common elsewhere. In Arabic script, they are always written as if /ay/ and /aw/ too. The British usage of /ai/ and /au/ reflects the older RP speech.
Since writing the above I've found a study comparing how long the diphthongs are in the transitional phase between the start and end points: for English it's very high, something like 80% of the duration is spent in moving the tongue, while in Arabic it's very low, about 20%. This could well explain why the Arabic sounds are felt to end in consonants.
Rising and falling
Having said all this, I have to now say that
rising and
falling don't actually mean physical rise and fall of the
tongue, as in /ai/ and /au/. They refer to the fact that the /a/ component is the more
energetic, and the second component is a weaker
off-glide. We may notate them as /ái/ and /áu/. The
energy falls off. This is called a
falling diphthong.
I don't know of any language in which rising diphthongs /aí/ or /aú/ occur. (The Spanish name Raúl is two syllables, so doesn't count.) However we can see examples of the same vowel sequence changing historically from falling to rising.
French OI is now pronounced /wa/, yet clearly some five hundred years ago or before it was as in English: words borrowed from Middle French like royal still have the simple /ói/ diphthong in English. (And Shakespeare plays on French moi sounding like English moy, though the change had actually taken place earlier; perhaps Shakespeare's pronunciation of French was Anglicized, I'm not sure.) What happened is that /oi/, a normal falling diphthong /ói/, shifted to /óe/, then changed from falling to rising /oé/, which sounds rather like /we/. (It then continued to change to /wa/, though the aristocrats continued to say /we/ until the Revolution.)
Latin didn't have /oi/, but it had /oe/. This came from two sources: as transcription of Greek /oi/, and in native words after labial consonants /f/ /m/ /p/, as in /poena/ 'punishment'. The two sources are unconnected. (Original Latin /oi/ had changed to long /i/ before the classical period, as in /domini/ 'masters' from earlier /dominoi/.)
Probably what happened is that an original */pena/ developed a /w/ sound or a rounding of the vowel near it, because the consonant was labial. Then /poéna/ became /póena/, switching from rising to falling while keeping the same sequence. This gave a falling diphthong close enough to /ói/ that it could be used for Greek /oi/ once they started borrowing from Greek.