Mine kept one cumulative list adding left to right and another summing right to left. This is conventionally slower, but conceivably faster if you do it in parallel, leaving the final loop to be something like:
for i in xrange(len(A)):
if right[i] == left[i]:
return i
The final loop here is slightly tighter, so as long as your parallelism didn't add big overheads, this will be faster.
Original: http://pastebin.com/m6e742f56 Parallelised: http://pastebin.com/m32a60b1