According to just about everybody I’ve heard speak on the subject, it sounds like whenever Siri is invoked on the iPhone 4S, the voice command is sent to some Apple server somewhere (presumably after going through some local serialization process or something) for recognition processing and, if necessary, look up whatever web resources are required to fulfill the user’s request.
I’ve heard a few people wonder out loud why Siri needs to make a round trip to Apple’s server when the user issued a simple command like “Call Mary” which could, in theory, be executed using only the information on the phone (i.e., no external web resources or anything are needed to complete the request). After all, the hardware and software inside the iPhone ought to be able to figure out the world “call” and deduce which contact in your address book would probably go by “Mary”, right?
I have a theory about why (again, as far as I know and have heard) Siri sends each and every request it’s given to an Apple server: when all the processing is done on the server, the Siri recognition engine can be tuned and refined without having to deploy updated versions of iOS (remember, Siri isn’t an app that Apple can update on its own).
I’m fairly certain that there are a non-trivial number of Siri engineers whose job it is to simply monitor the incoming request data and see which phrases aren’t returning a result because the system doesn’t understand them.
Of course, I think they’re also refining Siri’s ability to recognize different accents and colloquialisms within existing regions or locales (a great many common words are spoken differently in South Boston than they are in Plano, Texas).
My point is that the more data Siri’s engineers can get their hands on, the better they can make it.