When available, the publication has been linked as PDF. To access, please click on the PDF symbol at the bottom right of each listed publication.
2019 

1.  Aadirupa, Saha; Rakesh Shivanna; Chiranjib Bhattacharyya How Many Pairwise Preferences Do We Need to Rank a Graph Consistently? Conference Proceedings of the AAAI Conference on Artificial Intelligence, 33 , 2019. Abstract  BibTeX  Tags: Machine Learning  Links: @conference{Aadirupa2019, title = {How Many Pairwise Preferences Do We Need to Rank a Graph Consistently?}, author = {Aadirupa, Saha; Rakesh Shivanna; Chiranjib Bhattacharyya}, url = {http://10.0.54.4/wpcontent/uploads/2019/09/Howmanypairwisepreferencesdoweneedtorankagraphconsistently1.pdf}, doi = {10.1609/aaai.v33i01.33014830}, year = {2019}, date = {20190717}, booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence}, volume = {33}, pages = {48304837}, abstract = {We consider the problem of optimal recovery of true ranking of n items from a randomly chosen subset of their pairwise preferences. It is well known that without any further assumption, one requires a sample size of Ω(n2) for the purpose. We analyze the problem with an additional structure of relational graph G([n],E) over the n items added with an assumption of locality: Neighboring items are similar in their rankings. Noting the preferential nature of the data, we choose to embed not the graph, but, its strong product to capture the pairwise node relationships. Furthermore, unlike existing literature that uses Laplacian embedding for graph based learning problems, we use a richer class of graph embeddings—orthonormal representations—that includes (normalized) Laplacian as its special case. Our proposed algorithm, PrefRank, predicts the underlying ranking using an SVM based approach using the chosen embedding of the product graph, and is the first to provide statistical consistency on two ranking losses: Kendall’s tau and Spearman’s footrule, with a required sample complexity of O(n2χ(G¯))⅔ pairs, χ(G¯) being the chromatic number of the complement graph G¯. Clearly, our sample complexity is smaller for dense graphs, with χ(G¯) characterizing the degree of node connectivity, which is also intuitive due to the locality assumption e.g. O(n4/3) for union of kcliques, or O(n5/3) for random and power law graphs etc.—a quantity much smaller than the fundamental limit of Ω(n2) for large n. This, for the first time, relates ranking complexity to structural properties of the graph. We also report experimental evaluations on different synthetic and realworld datasets, where our algorithm is shown to outperform the state of the art methods.}, keywords = {Machine Learning}, pubstate = {published}, tppubtype = {conference} } We consider the problem of optimal recovery of true ranking of n items from a randomly chosen subset of their pairwise preferences. It is well known that without any further assumption, one requires a sample size of Ω(n2) for the purpose. We analyze the problem with an additional structure of relational graph G([n],E) over the n items added with an assumption of locality: Neighboring items are similar in their rankings. Noting the preferential nature of the data, we choose to embed not the graph, but, its strong product to capture the pairwise node relationships. Furthermore, unlike existing literature that uses Laplacian embedding for graph based learning problems, we use a richer class of graph embeddings—orthonormal representations—that includes (normalized) Laplacian as its special case. Our proposed algorithm, PrefRank, predicts the underlying ranking using an SVM based approach using the chosen embedding of the product graph, and is the first to provide statistical consistency on two ranking losses: Kendall’s tau and Spearman’s footrule, with a required sample complexity of O(n2χ(G¯))⅔ pairs, χ(G¯) being the chromatic number of the complement graph G¯. Clearly, our sample complexity is smaller for dense graphs, with χ(G¯) characterizing the degree of node connectivity, which is also intuitive due to the locality assumption e.g. O(n4/3) for union of kcliques, or O(n5/3) for random and power law graphs etc.—a quantity much smaller than the fundamental limit of Ω(n2) for large n. This, for the first time, relates ranking complexity to structural properties of the graph. We also report experimental evaluations on different synthetic and realworld datasets, where our algorithm is shown to outperform the state of the art methods. 
2018 

2.  Krishnapuram, Raghu; Rajkumar, Arun; Acharya, Adithya; Dhara, Nikhil; Goudar, Manjunath; Sarashetti, Akshay P Online domain adaptation by exploiting labeled features and proactive learning Conference Proceedings of the 2018 ACM India Joint International Conference on Data Science and Management of Data (CoDSCOMAD), 11.13.01.18, Goa, 2018. Abstract  BibTeX  Tags: Machine Learning  Links: @conference{Krishnapuram2018, title = { Online domain adaptation by exploiting labeled features and proactive learning}, author = {Raghu Krishnapuram and Arun Rajkumar and Adithya Acharya and Nikhil Dhara and Manjunath Goudar and Akshay P. Sarashetti}, url = {http://www.rbccps.org/wpcontent/uploads/2018/08/p137krishnapuram.pdf}, doi = {10.1145/3152494.3152507}, year = {2018}, date = {20180113}, booktitle = {Proceedings of the 2018 ACM India Joint International Conference on Data Science and Management of Data (CoDSCOMAD), 11.13.01.18, Goa}, pages = {137145}, abstract = {Domain adaptation algorithms have gained considerable interest in recent years due to their empirical success. However, most of the existing literature assumes the availability of a pool of unlabeled target domain data which is not the case in several practical scenarios. We consider the problem of online domain adaptation where the target (test) data arrives one at a time, and there is a fixed budget available for obtaining their labels. We propose a novel online domain adaptation algorithm which uses the budget judiciously by balancing the cost and reliability of the oracles which provide the labels. The proposed algorithm uses a probabilistic model which allows the seamless integration of previously unseen features (i.e., features that may appear during test time) into the model. Experiments on several benchmark real world datasets empirically establish the efficacy of the proposed algorithm.}, keywords = {Machine Learning}, pubstate = {published}, tppubtype = {conference} } Domain adaptation algorithms have gained considerable interest in recent years due to their empirical success. However, most of the existing literature assumes the availability of a pool of unlabeled target domain data which is not the case in several practical scenarios. We consider the problem of online domain adaptation where the target (test) data arrives one at a time, and there is a fixed budget available for obtaining their labels. We propose a novel online domain adaptation algorithm which uses the budget judiciously by balancing the cost and reliability of the oracles which provide the labels. The proposed algorithm uses a probabilistic model which allows the seamless integration of previously unseen features (i.e., features that may appear during test time) into the model. Experiments on several benchmark real world datasets empirically establish the efficacy of the proposed algorithm. 
3.  Joseph, Ajin George; Bhatnagar, Shalabh An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method Journal Article arXiv: Computer Science, 2018. Abstract  BibTeX  Tags: Machine Learning  Links: @article{Joseph2018, title = {An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method}, author = {Ajin George Joseph and Shalabh Bhatnagar}, url = {http://www.rbccps.org/wpcontent/uploads/2018/12/1806.06720.pdf}, year = {2018}, date = {20180615}, journal = {arXiv: Computer Science}, abstract = {In this paper, we provide two new stable online algorithms for the problem of prediction in reinforcement learning, emph{i.e.}, estimating the value function of a modelfree Markov reward process using the linear function approximation architecture and with memory and computation costs scaling quadratically in the size of the feature set. The algorithms employ the multitimescale stochastic approximation variant of the very popular cross entropy (CE) optimization method which is a model based search method to find the global optimum of a realvalued function. A proof of convergence of the algorithms using the ODE method is provided. We supplement our theoretical results with experimental comparisons. The algorithms achieve good performance fairly consistently on many RL benchmark problems with regards to computational efficiency, accuracy and stability. }, keywords = {Machine Learning}, pubstate = {published}, tppubtype = {article} } In this paper, we provide two new stable online algorithms for the problem of prediction in reinforcement learning, emph{i.e.}, estimating the value function of a modelfree Markov reward process using the linear function approximation architecture and with memory and computation costs scaling quadratically in the size of the feature set. The algorithms employ the multitimescale stochastic approximation variant of the very popular cross entropy (CE) optimization method which is a model based search method to find the global optimum of a realvalued function. A proof of convergence of the algorithms using the ODE method is provided. We supplement our theoretical results with experimental comparisons. The algorithms achieve good performance fairly consistently on many RL benchmark problems with regards to computational efficiency, accuracy and stability. 
4.  Maiya, Shishira R; Babu, Sudharshan Chandra Slum segmentation and change detection: A deep learning approach Journal Article arXiv: Computer Science, 2018. Abstract  BibTeX  Tags: Machine Learning  Links: @article{Maiya2018, title = {Slum segmentation and change detection: A deep learning approach}, author = {Shishira R. Maiya and Sudharshan Chandra Babu}, url = {http://www.rbccps.org/wpcontent/uploads/2018/12/1811.07896.pdf}, year = {2018}, date = {20181119}, journal = {arXiv: Computer Science}, abstract = {More than one billion people live in slums around the world. In some developing countries, slum residents make up for more than half of the population and lack reliable sanitation services, clean water, electricity, other basic services. Thus, slum rehabilitation and improvement is an important global challenge, and a significant amount of effort and resources have been put into this endeavor. These initiatives rely heavily on slum mapping and monitoring, and it is essential to have robust and efficient methods for mapping and monitoring existing slum settlements. In this work, we introduce an approach to segment and map individual slums from satellite imagery, leveraging regional convolutional neural networks for instance segmentation using transfer learning. In addition, we also introduce a method to perform change detection and monitor slum change over time. We show that our approach effectively learns slum shape and appearance, and demonstrates strong quantitative results, resulting in a maximum AP of 80.0. }, keywords = {Machine Learning}, pubstate = {published}, tppubtype = {article} } More than one billion people live in slums around the world. In some developing countries, slum residents make up for more than half of the population and lack reliable sanitation services, clean water, electricity, other basic services. Thus, slum rehabilitation and improvement is an important global challenge, and a significant amount of effort and resources have been put into this endeavor. These initiatives rely heavily on slum mapping and monitoring, and it is essential to have robust and efficient methods for mapping and monitoring existing slum settlements. In this work, we introduce an approach to segment and map individual slums from satellite imagery, leveraging regional convolutional neural networks for instance segmentation using transfer learning. In addition, we also introduce a method to perform change detection and monitor slum change over time. We show that our approach effectively learns slum shape and appearance, and demonstrates strong quantitative results, resulting in a maximum AP of 80.0. 
2017 

5.  Joseph, Ajin George; Bhatnagar, Shalabh An incremental fast policy search using a single sample path Conference Proceedings of the 7th International Conference on Pattern Recognition and Machine Intelligence (PReMI), 05.08.12.17, Kolkata (India), Lecture Notes in Computer Science 2017. Abstract  BibTeX  Tags: Machine Learning  Links: @conference{Joseph2017c, title = {An incremental fast policy search using a single sample path}, author = {Ajin George Joseph and Shalabh Bhatnagar}, doi = {10.1007/9783319699004_1}, year = {2017}, date = {20171101}, booktitle = {Proceedings of the 7th International Conference on Pattern Recognition and Machine Intelligence (PReMI), 05.08.12.17, Kolkata (India)}, pages = {310}, series = {Lecture Notes in Computer Science}, abstract = {In this paper, we consider the control problem in a reinforcement learning setting with large state and action spaces. The control problem most commonly addressed in the contemporary literature is to find an optimal policy which optimizes the long run γdiscounted transition costs, where γ∈[0,1). They also assume access to a generative model/simulator of the underlying MDP with the hidden premise that realization of the system dynamics of the MDP for arbitrary policies in the form of sample paths can be obtained with ease from the model. In this paper, we consider a cost function which is the expectation of a approximate value function w.r.t. the steady state distribution of the Markov chain induced by the policy, without having access to the generative model. We assume that a single sample path generated using a priori chosen behaviour policy is made available. In this information restricted setting, we solve the generalized control problem using the incremental cross entropy method. The proposed algorithm is shown to converge to the solution which is globally optimal relative to the behaviour policy.}, keywords = {Machine Learning}, pubstate = {published}, tppubtype = {conference} } In this paper, we consider the control problem in a reinforcement learning setting with large state and action spaces. The control problem most commonly addressed in the contemporary literature is to find an optimal policy which optimizes the long run γdiscounted transition costs, where γ∈[0,1). They also assume access to a generative model/simulator of the underlying MDP with the hidden premise that realization of the system dynamics of the MDP for arbitrary policies in the form of sample paths can be obtained with ease from the model. In this paper, we consider a cost function which is the expectation of a approximate value function w.r.t. the steady state distribution of the Markov chain induced by the policy, without having access to the generative model. We assume that a single sample path generated using a priori chosen behaviour policy is made available. In this information restricted setting, we solve the generalized control problem using the incremental cross entropy method. The proposed algorithm is shown to converge to the solution which is globally optimal relative to the behaviour policy. 
6.  Joseph, Ajin George; Bhatnagar, Shalabh Bounds for offpolicy prediction in reinforcement learning Conference Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), 14.19.05.17, Anchorage (USA), 2017. Abstract  BibTeX  Tags: Machine Learning  Links: @conference{Joseph2017b, title = {Bounds for offpolicy prediction in reinforcement learning}, author = {Ajin George Joseph and Shalabh Bhatnagar}, url = {http://www.rbccps.org/wpcontent/uploads/2018/01/07966359.pdf}, doi = {10.1109/IJCNN.2017.7966359}, year = {2017}, date = {20170703}, booktitle = {Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), 14.19.05.17, Anchorage (USA)}, pages = {39913997}, abstract = {In this paper, we provide for the first time, error bounds for the offpolicy prediction in reinforcement learning. The primary objective in offpolicy prediction is to estimate the value function of a given target policy of interest using the linear function approximation architecture by utilizing a sample trajectory generated by a behaviour policy which is possibly different from the target policy. The stability of the offpolicy prediction has been an open question for a long time. Only recently, could Yu provide a generalized proof, which makes our results more appealing to the reinforcement learning community. The offpolicy prediction is useful in complex reinforcement learning settings, where the sample trajectory is hard to obtain and one has to rely on the sample behaviour of the system with respect to an arbitrary policy. We provide here error bound on the solution of the offpolicy prediction with respect to a closeness measure between the target and the behaviour policy.}, keywords = {Machine Learning}, pubstate = {published}, tppubtype = {conference} } In this paper, we provide for the first time, error bounds for the offpolicy prediction in reinforcement learning. The primary objective in offpolicy prediction is to estimate the value function of a given target policy of interest using the linear function approximation architecture by utilizing a sample trajectory generated by a behaviour policy which is possibly different from the target policy. The stability of the offpolicy prediction has been an open question for a long time. Only recently, could Yu provide a generalized proof, which makes our results more appealing to the reinforcement learning community. The offpolicy prediction is useful in complex reinforcement learning settings, where the sample trajectory is hard to obtain and one has to rely on the sample behaviour of the system with respect to an arbitrary policy. We provide here error bound on the solution of the offpolicy prediction with respect to a closeness measure between the target and the behaviour policy. 
7.  Sankaran, Raman; Bach, Francis R; Bhattacharyya, Chiranjib Identifying groups of strongly correlated variables through smoothed ordered weighted L1norms Journal Article Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), 20.22.04.17, Fort Lauderdale (USA), pp. 11231131, 2017. Abstract  BibTeX  Tags: Machine Learning  Links: @article{Sankaran2017, title = {Identifying groups of strongly correlated variables through smoothed ordered weighted L1norms}, author = {Raman Sankaran and Francis R. Bach and Chiranjib Bhattacharyya}, url = {http://www.rbccps.org/wpcontent/uploads/2018/01/sankaran17a.pdf}, year = {2017}, date = {20170816}, journal = {Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), 20.22.04.17, Fort Lauderdale (USA)}, pages = {11231131}, abstract = {The failure of LASSO to identify groups of correlated predictors in linear regression has sparked significant research interest. Recently, various norms were proposed, which can be best described as instances of ordered weighted ℓ1 norms (OWL), as an alternative to ℓ1 regularization used in LASSO. OWL can identify groups of correlated variables but it forces the model to be constant within a group. This artifact induces unnecessary bias in the model estimation. In this paper we take a submodular perspective and show that OWL can be posed as the Lovász extension of a suitably defined submodular function. The submodular perspective not only explains the groupwise constant behavior of OWL, but also suggests alternatives. The main contribution of this paper is smoothed OWL (SOWL), a new family of norms, which not only identifies the groups but also allows the model to be flexible inside a group. We establish several algorithmic and theoretical properties of SOWL including group identification and model consistency. We also provide algorithmic tools to compute the SOWL norm and its proximal operator, whose computational complexity O(dlogd) is significantly better than that of general purpose solvers in O(d2logd). In our experiments, SOWL compares favorably with respect to OWL in the regimes of interest.}, keywords = {Machine Learning}, pubstate = {published}, tppubtype = {article} } The failure of LASSO to identify groups of correlated predictors in linear regression has sparked significant research interest. Recently, various norms were proposed, which can be best described as instances of ordered weighted ℓ1 norms (OWL), as an alternative to ℓ1 regularization used in LASSO. OWL can identify groups of correlated variables but it forces the model to be constant within a group. This artifact induces unnecessary bias in the model estimation. In this paper we take a submodular perspective and show that OWL can be posed as the Lovász extension of a suitably defined submodular function. The submodular perspective not only explains the groupwise constant behavior of OWL, but also suggests alternatives. The main contribution of this paper is smoothed OWL (SOWL), a new family of norms, which not only identifies the groups but also allows the model to be flexible inside a group. We establish several algorithmic and theoretical properties of SOWL including group identification and model consistency. We also provide algorithmic tools to compute the SOWL norm and its proximal operator, whose computational complexity O(dlogd) is significantly better than that of general purpose solvers in O(d2logd). In our experiments, SOWL compares favorably with respect to OWL in the regimes of interest. 
2013 

8.  Bhatnagar, Shalabh; Lakshmanan, K An online convergent Qlearning algorithm with linear function approximation Technical Report 2013. Abstract  BibTeX  Tags: Machine Learning  Links: @techreport{Bhatnagar2013, title = {An online convergent Qlearning algorithm with linear function approximation}, author = {Shalabh Bhatnagar and K. Lakshmanan}, url = {http://www.rbccps.org/wpcontent/uploads/2018/12/5a5ce39a35fe00c7a4b97982dbb0c933f1ed.pdf}, year = {2013}, date = {20130913}, abstract = {We present in this article a variant of Qlearning with linear function approximation that is based on twotimescale stochastic approximation. Whereas it is difficult to prove convergence of regular Qlearning with linear function approximation because of the offpolicy problem, we prove that our algorithm is convergent. Numerical results on a multistage stochastic shortest path problem show that our algorithm exhibits significantly better performance and is more robust as compared to Qlearning.}, keywords = {Machine Learning}, pubstate = {published}, tppubtype = {techreport} } We present in this article a variant of Qlearning with linear function approximation that is based on twotimescale stochastic approximation. Whereas it is difficult to prove convergence of regular Qlearning with linear function approximation because of the offpolicy problem, we prove that our algorithm is convergent. Numerical results on a multistage stochastic shortest path problem show that our algorithm exhibits significantly better performance and is more robust as compared to Qlearning. 