It's possible because you can use special side-channel data to instruct decoder to drop specific number of samples at the beginning and at the end of stream.
See these parts of the specification:
> 4.2. Pre-skip
> The 'pre-skip' field MAY also be used to perform sample-accurate
> cropping of already encoded streams. In this case, a value of at
> least 3840 samples (80 ms) provides sufficient history to the decoder
> that it will have converged before the stream's output begins.
> 4.4. End Trimming
> The page with the 'end of stream' flag set MAY have a granule
> position that indicates the page contains less audio data than would
> normally be returned by decoding up through the final packet. This
> is used to end the stream somewhere other than an even frame
> boundary. The granule position of the most recent audio data page
> with completed packets is used to make this determination, or '0' is
> used if there were no previous audio data pages with a completed
> packet. The difference between these granule positions indicates how
> many samples to keep after decoding the packets that completed on the
> final page. The remaining samples are discarded. The number of
> discarded samples SHOULD be no larger than the number decoded from
> the last packet.