Question:

You have a stream of bytes from which you can read one byte at a time. You only have enough space to store one byte. After processing those bytes, you have to return a random byte. Note: The probability of picking any one of those bytes should be equal.

Answer:

Since we can only store one byte at a time, we have to be able to work with the bytes as they come in. We can’t store everything, count the number of bytes and pick one. That would be too easy wouldn’t it?

Let’s think it through. We don’t know how many bytes there are, so at any point in time we should have a random byte picked from all the bytes we have seen so far. We shouldn’t worry about the total number of bytes at any time.

When we get the first byte, it is simple. There is only one so we store it. When we get the second one, it has 1/2 probability of being picked and the one we have stored has a 1/2 probability of being picked (we can use any programming language’s built-in random function). After picking one, we replace the current stored byte with the one we picked. Now it gets interesting. When we get the third one, it has 1/3 probability of being picked and the one we have stored has a 2/3 probability of being picked. We pick one of the two bytes based on this probability. We can keep doing this forever.

Just generalizing, when we get the nth byte, it has a 1/n probability of being picked and the byte we have stored has (n-1)/n probability of being picked.

Sometimes, this question is a little confusing. The question said you have only space to store one byte but there is an assumption that you have space to store variables to track the number of bytes and so on. Another assumption is that the stream of bytes is not infinite. If not, we won’t be able to calculate (n-1)/n.

You have a stream of bytes from which you can read one byte at a time. You only have enough space to store one byte. After processing those bytes, you have to return a random byte. Note: The probability of picking any one of those bytes should be equal.

Answer:

Since we can only store one byte at a time, we have to be able to work with the bytes as they come in. We can’t store everything, count the number of bytes and pick one. That would be too easy wouldn’t it?

Let’s think it through. We don’t know how many bytes there are, so at any point in time we should have a random byte picked from all the bytes we have seen so far. We shouldn’t worry about the total number of bytes at any time.

When we get the first byte, it is simple. There is only one so we store it. When we get the second one, it has 1/2 probability of being picked and the one we have stored has a 1/2 probability of being picked (we can use any programming language’s built-in random function). After picking one, we replace the current stored byte with the one we picked. Now it gets interesting. When we get the third one, it has 1/3 probability of being picked and the one we have stored has a 2/3 probability of being picked. We pick one of the two bytes based on this probability. We can keep doing this forever.

Just generalizing, when we get the nth byte, it has a 1/n probability of being picked and the byte we have stored has (n-1)/n probability of being picked.

Sometimes, this question is a little confusing. The question said you have only space to store one byte but there is an assumption that you have space to store variables to track the number of bytes and so on. Another assumption is that the stream of bytes is not infinite. If not, we won’t be able to calculate (n-1)/n.