Consider the online learning problem with demand learning. The firm sells a product without any historical demand information. In each period, the firm can set a price and would observe a demand based on the charged price. Suppose the true demand function in each period is D(p) = 9−3p+ϵ, where ϵ is a random variable with zero mean. The marginal cost is negligible. (a) What price should the firm charge if you know the demand function? What would be the expected revenue if the firm implements this price for n periods? (b) Suppose you do not know the demand function, but know that the demand is a linear function, and the slope is 3. In other words, you know that demand is D(p) = a − 3p + ϵ and would like to estimate a from the data. Suppose from the historical data, you have k pairs of demand and price, i.e., (p1, D1),(p2, D2), ...,(pK, DK). Suppose you would like to minimize the residual sum of square, i.e., min a X K i=1 (Di − a + 3pi) 2 . (1) What would be your best estimate of a? Hint: you need to solve the optimal solution to the optimization problem in (1). (c) Now suppose you use the iterative least squared method. Start with the price p1 = 1, in each period t = 1, 2, ..., T, the following events happens sequentially: (1) You charge the price pt ; (2) Receive Dt = 9 − 3pt + ϵt where ϵt is a sample of ϵ; (3) Solve the optimization problem (1) with K = t, and obtain the optimal solution as at ; (4) Set pt+1 = at 6 . Suppose the realizations of ϵ are (ϵ1, ϵ2, ..., ϵ8) = (1, −1, 2, −2, 1, −1, 2, −2). What is the average regret by implementing the iterative least squared algorithm? Hint: you do not have to code since all calculations are simple.
Consider the online learning problem with
Step by step
Solved in 4 steps