AGARWAL, A., CHAPELLE, O., DUD´IK, M., AND LANGFORD, J. 2011. A reliable effective terascale linear
learning system. CoRR abs/1110.4198.
AGARWAL, D., AGRAWAL, R., KHANNA, R., AND KOTA, N. 2010. Estimating rates of rare events with multiple
hierarchies through scalable log-linear models. In Proceedings of the 16th ACM SIGKDD international
conference on Knowledge discovery and data mining. 213–222.
ASHKAN, A., CLARKE, C. L. A., AGICHTEIN, E., AND GUO, Q. 2009. Estimating ad clickthrough rate
through query intent analysis. In Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference
on Web Intelligence and Intelligent Agent Technology.
AUER, P., CESA-BIANCHI, N., AND FISCHER, P. 2002. Finite-time analysis of the multiarmed bandit problem.
Machine learning 47, 2, 235–256.
BACH, F., JENATTON, R., MAIRAL, J., AND OBOZINSKI, G. 2011. Optimization with sparsity-inducing
penalties. Foundations and Trends in Machine Learning 4, 1, 1–106.
BISHOP, C. M. 2006. Pattern Recognition and Machine Learning. Springer-Verlag New York, Inc.
BLOOM, B. 1970. Space/time trade-offs in hash coding with allowable errors. Communications of the
ACM 13, 7, 422–426.
CANINI, K., CHANDRA, T., IE, E., MCFADDEN, J., GOLDMAN, K., GUNTER, M., HARMSEN, J., LEFEVRE,
K., LEPIKHIN, D., LLINARES, T. L., MUKHERJEE, I., PEREIRA, F., REDSTONE, J., SHAKED, T., AND
SINGER, Y. 2012. Sibyl: A system for large scale supervised machine learning. Presentation at MLSS
Santa Cruz, http://users.soe.ucsc.edu/~niejiazhong/slides/chandra.pdf.
CHAKRABARTI, D., AGARWAL, D., AND JOSIFOVSKI, V. 2008. Contextual advertising by combining relevance
with click feedback. In Proceedings of the 17th international conference on World Wide Web.
417–426.
CHANG, Y.-W., HSIEH, C.-J., CHANG, K.-W., RINGGAARD, M., AND LIN, C.-J. 2010. Training and testing
low-degree polynomial data mappings via linear SVM. The Journal of Machine Learning Research 11,
1471–1490.
CHAPELLE, O. AND LI, L. 2011. An empirical evaluation of thompson sampling. In Advances in Neural Information
Processing Systems 24, J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K. Weinberger,
Eds. 2249–2257.
CHEN, S. AND GOODMAN, J. 1999. An empirical study of smoothing techniques for language modeling.
Computer Speech & Language 13, 4, 359–393.
CHENG, H. AND CANT´U-PAZ, E. 2010. Personalized click prediction in sponsored search. In Proceedings of
the third ACM international conference on Web search and data mining.
CHENG, H., ZWOL, R. V., AZIMI, J., MANAVOGLU, E., ZHANG, R., ZHOU, Y., AND NAVALPAKKAM, V. 2012.
Multimedia features for click prediction of new ads in display advertising. In Proceedings of the 18th
ACM SIGKDD international conference on Knowledge discovery and data mining.
CHU, C., KIM, S., LIN, Y., YU, Y., BRADSKI, G., NG, A., AND OLUKOTUN, K. 2007. Map-reduce for machine
learning on multicore. In Advances in Neural Information Processing Systems 19: Proceedings of the
2006 Conference. Vol. 19.
CIARAMITA, M., MURDOCK, V., AND PLACHOURAS, V. 2008. Online learning from click data for sponsored
search. In Proceedings of the 17th international conference on World Wide Web. 227–236.
CORTES, C., MANSOUR, Y., AND MOHRI, M. 2010. Learning bounds for importance weighting. In Advances
in Neural Information Processing Systems. Vol. 23. 442–450.
DEAN, J. AND GHEMAWAT, S. 2008. Mapreduce: simplified data processing on large clusters. Communications
of the ACM 51, 1, 107–113.
DUCHI, J., HAZAN, E., AND SINGER, Y. 2010. Adaptive subgradient methods for online learning and
stochastic optimization. Journal of Machine Learning Research 12, 2121–2159.
EVGENIOU, T. AND PONTIL, M. 2004. Regularized multi-task learning. In Proceedings of the tenth ACM
SIGKDD international conference on Knowledge discovery and data mining. ACM, 109–117.
GELMAN, A. AND HILL, J. 2006. Data analysis using regression and multilevel/hierarchical models. Cambridge
University Press.
GITTINS, J. C. 1989. Multi-armed Bandit Allocation Indices. Wiley Interscience Series in Systems and Optimization.
John Wiley & Sons Inc.
GRAEPEL, T., CANDELA, J. Q., BORCHERT, T., AND HERBRICH, R. 2010. Web-scale bayesian click-through
rate prediction for sponsored search advertising in microsoft’s bing search engine. In Proceedings of the
27th International Conference on Machine Learning.
GUYON, I. AND ELISSEEFF, A. 2003. An introduction to variable and feature selection. The Journal of
Machine Learning Research 3, 1157–1182.
HILLARD, D., MANAVOGLU, E., RAGHAVAN, H., LEGGETTER, C., CANT´U -PAZ, E., AND IYER, R. 2011. The
sum of its parts: reducing sparsity in click estimation with query segments. Information Retrieval, 1–22.
HILLARD, D., SCHROEDL, S., MANAVOGLU, E., RAGHAVAN, H., AND LEGGETTER, C. 2010. Improving ad
relevance in sponsored search. In Proceedings of the third ACM international conference on Web search
and data mining. 361–370.
KEARNS, M. 1993. Efficient noise-tolerant learning from statistical queries. In Proceedings of the Twenty-
Fifth Annual ACM Symposium on the Theory of Computing. 392–401.
KING, G. AND ZENG, L. 2001. Logistic regression in rare events data. Political analysis 9, 2, 137–163.
KOEPKE, H. AND BILENKO, M. 2012. Fast prediction of new feature utility. In Proceedings of the 29th
International Conference on Machine Learning.
KOTA, N. AND AGARWAL, D. 2011. Temporal multi-hierarchy smoothing for estimating rates of rare events.
In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data
mining.
LAI, T. AND ROBBINS, H. 1985. Asymptotically efficient adaptive allocation rules. Advances in applied
mathematics 6, 4–22.
LANGFORD, J., LI, L., AND STREHL, A. 2007. Vowpal wabbit open source project. https://github.com/
JohnLangford/vowpal_wabbit/wiki.
LI, L., CHU, W., LANGFORD, J., AND SCHAPIRE, R. E. 2010. A contextual-bandit approach to personalized
news article recommendation. In Proceedings of the 19th international conference on World wide web.
661–670.
LI, L., CHU, W., LANGFORD, J., AND WANG, X. 2011. Unbiased offline evaluation of contextual-bandit-based
news article recommendation algorithms. In Proceedings of the fourth ACM international conference on
Web search and data mining. 297–306.
LIU, Y., PANDEY, S., AGARWAL, D., AND JOSIFOVSKI, V. 2012. Finding the right consumer: optimizing for
conversion in display advertising campaigns. In Proceedings of the fifth ACM international conference
on Web search and data mining.
LOW, Y., GONZALEZ, J., KYROLA, A., BICKSON, D., GUESTRIN, C., AND HELLERSTEIN, J. M. 2010.
Graphlab: A new framework for parallel machine learning. In The 26th Conference on Uncertainty in
Artificial Intelligence.
MACQUEEN, J. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings
of 5th Berkeley Symposium on Mathematical Statistics and Probability. University of California
Press, Berkeley, CA, 281–297.
MCAFEE, R. 2011. The design of advertising exchanges. Review of Industrial Organization, 1–17.
MCMAHAN, H. B. AND STREETER, M. 2010. Adaptive bound optimization for online convex optimization.
In Proceedings of the 23rd Annual Conference on Learning Theory.
MEEK, C., CHICKERING, D. M., AND WILSON, D. 2005. Stochastic and contingent payment auctions. In
Workshop on Sponsored Search Auctions, ACM Electronic Commerce.
MEIER, L., VAN DE GEER, S., AND B¨UHLMANN, P. 2008. The group lasso for logistic regression. Journal of
the Royal Statistical Society: Series B (Statistical Methodology) 70, 1, 53–71.
MENARD, S. 2001. Applied logistic regression analysis. Vol. 106. Sage Publications, Inc.
MENON, A. K., CHITRAPURA, K.-P., GARG, S., AGARWAL, D., AND KOTA, N. 2011. Response prediction
using collaborative filtering with hierarchies and side-information. In Proceedings of the 17th ACM
SIGKDD international conference on Knowledge discovery and data mining.
MINKA, T. 2003. A comparison of numerical optimizers for logistic regression. Tech. rep., Microsoft Research.
MUTHUKRISHNAN, S. 2009. Ad exchanges: Research issues. In Proceedings of the 5th International Workshop
on Internet and Network Economics.
NIGAM, K., LAFFERTY, J., AND MCCALLUM, A. 1999. Using maximum entropy for text classification. In
IJCAI-99 workshop on machine learning for information filtering. Vol. 1. 61–67.
NOCEDAL, J. 1980. Updating quasi-newton matrices with limited storage. Mathematics of computation
35, 151, 773–782.
OWEN, A. 2007. Infinitely imbalanced logistic regression. The Journal of Machine Learning Research 8,
761–773
REGELSON, M. AND FAIN, D. C. 2006. Predicting click-through rate using keyword clusters. In Proceedings
of the Second Workshop on Sponsored Search Auctions.
RICHARDSON, M., DOMINOWSKA, E., AND RAGNO, R. 2007. Predicting clicks: estimating the click-through
rate for new ads. In Proceedings of the 16th International conference on World Wide Web. New York, NY,
521–530.
ROSALES, R. AND CHAPELLE, O. 2011. Attribute selection by measuring information on reference distributions.
In Tech Pulse Conference, Yahoo!
ROSALES, R., CHENG, H., AND MANAVOGLU, E. 2012. Post-click conversion modeling and analysis for nonguaranteed
delivery display advertising. In Proceedings of the fifth ACM international conference on
Web search and data mining. ACM, 293–302.
SARKAR, J. 1991. One-armed bandit problems with covariates. The Annals of Statistics, 1978–2002.
SCH¨OLKOPF, B. AND SMOLA, A. 2001. Learning with kernels: Support vector machines, regularization, optimization,
and beyond. MIT press.
SHI, Q., PETTERSON, J., DROR, G., LANGFORD, J., SMOLA, A., AND VISHWANATHAN, S. 2009. Hash kernels
for structured data. The Journal of Machine Learning Research 10, 2615–2637.
TEO, C., LE, Q., SMOLA, A., AND VISHWANATHAN, S. 2007. A scalable modular convex solver for regularized
risk minimization. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge
discovery and data mining.
THOMPSON, W. R. 1933. On the likelihood that one unknown probability exceeds another in view of the
evidence of two samples. Biometrika 25, 3–4, 285–294.
WEINBERGER, K., DASGUPTA, A., LANGFORD, J., SMOLA, A., AND ATTENBERG, J. 2009. Feature hashing
for large scale multitask learning. In Proceedings of the 26th Annual International Conference on
Machine Learning. 1113–1120.
YE, J., CHOW, J.-H., CHEN, J., AND ZHENG, Z. 2009. Stochastic gradient boosted distributed decision trees.
In Proceeding of the 18th ACM conference on Information and knowledge management. 2061–2064.
ZAHARIA, M., CHOWDHURY, M., DAS, T., DAVE, A., MA, J., MCCAULEY, M., FRANKLIN, M., SHENKER,
S., AND STOICA, I. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory
cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and
Implementation.