A Study on Privacy Preserving Big Data Mining: Techniques and Challenges
Main Article Content
Abstract
The basic goal of data mining algorithms is to extract previously undiscovered patterns from the data. When mining the data, sensitive and confidential information should be secured simultaneously to protect privacy. Due to the widespread use of information technology, enormous amounts of data are being produced at an exponential rate by several organisations, including hospitals, insurance providers, banks, e-commerce, and stock exchanges, making privacy a crucial concern in data mining. Anonymization, Perturbation, Generalization, and Cryptography are some of the privacy-preserving data mining techniques that have been proposed in the literature. In this study, we have reviewed all of these state of art techniques and presented a tabular comparison of work done by different authors as well as discussed the challenges of privacy preserving data mining.
Downloads
Article Details
COPYRIGHT
Submission of a manuscript implies: that the work described has not been published before, that it is not under consideration for publication elsewhere; that if and when the manuscript is accepted for publication, the authors agree to automatic transfer of the copyright to the publisher.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
- The journal allows the author(s) to retain publishing rights without restrictions.
- The journal allows the author(s) to hold the copyright without restrictions.
References
M. Chen, S. Mao, and Y. Liu, ‘Big Data: A Survey’, Mob. Netw. Appl., vol. 19, no. 2, pp. 171–209, Apr. 2014, doi: 10.1007/s11036-013-0489-0.
S. Yu, ‘Big Privacy: Challenges and Opportunities of Privacy Study in the Age of Big Data’, IEEE Access, vol. 4, pp. 2751–2763, 2016, doi: 10.1109/ACCESS.2016.2577036.
‘The-Morgan-Kaufmann-Series-in-Data-Management-Systems-Jiawei-Han-Micheline-Kamber-Jian-Pei-Data-Mining.-Concepts-and-Techniques-3rd-Edition-Morgan-Kaufmann-2011.pdf’. Accessed: Jul. 06, 2022. [Online]. Available: http://myweb.sabanciuniv.edu/rdehkharghani/files/2016/02/The-Morgan-Kaufmann-Series-in-Data-Management-Systems-Jiawei-Han-Micheline-Kamber-Jian-Pei-Data-Mining.-Concepts-and-Techniques-3rd-Edition-Morgan-Kaufmann-2011.pdf
R. Mendes and J. P. Vilela, ‘Privacy-Preserving Data Mining: Methods, Metrics, and Applications’, IEEE Access, vol. 5, pp. 10562–10582, 2017, doi: 10.1109/ACCESS.2017.2706947.
J. Marques and J. Bernardino, ‘Analysis of Data Anonymization Techniques’:, in Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, Budapest, Hungary, 2020, pp. 235–241. doi: 10.5220/0010142302350241.
P. Samarati and L. Sweeney, ‘Protecting Privacy when Disclosing Information: k-Anonymity and Its Enforcement through Generalization and Suppression’, p. 19.
A. Kiran and N. Shirisha, ‘K-Anonymization approach for privacy preservation using data perturbation techniques in data mining’, Mater. Today Proc., Jun. 2022, doi: 10.1016/j.matpr.2022.05.117.
S. Madan and P. Goswami, ‘Adaptive Privacy Preservation Approach for Big Data Publishing in Cloud using k-anonymization’, Recent Adv. Comput. Sci. Commun. Former. Recent Pat. Comput. Sci., vol. 14, no. 8, pp. 2678–2688, Oct. 2021, doi: 10.2174/2666255813999200630114256.
A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam, ‘L -diversity: Privacy beyond k -anonymity’, ACM Trans. Knowl. Discov. Data, vol. 1, no. 1, p. 3, Mar. 2007, doi: 10.1145/1217299.1217302.
B. B. Mehta and U. P. Rao, ‘Improved l-diversity: Scalable anonymization approach for Privacy Preserving Big Data Publishing’, J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 4, pp. 1423–1430, Apr. 2022, doi: 10.1016/j.jksuci.2019.08.006.
O. Temuujin, J. Ahn, and D.-H. Im, ‘Efficient L-Diversity Algorithm for Preserving Privacy of Dynamically Published Datasets’, IEEE Access, vol. 7, pp. 122878–122888, 2019, doi: 10.1109/ACCESS.2019.2936301.
N. Li, T. Li, and S. Venkatasubramanian, ‘t-Closeness: Privacy Beyond k-Anonymity and l-Diversity’, in 2007 IEEE 23rd International Conference on Data Engineering, Apr. 2007, pp. 106–115. doi: 10.1109/ICDE.2007.367856.
D. Roy and S. Jena, ‘Determining t in t-closeness using Multiple Sensitive Attributes’, Int. J. Comput. Appl., vol. 70, pp. 47–51, May 2013, doi: 10.5120/12179-8291.
N. Nasiri and M. Keyvanpour, ‘Classification and Evaluation of Privacy Preserving Data Mining Methods’, in 2020 11th International Conference on Information and Knowledge Technology (IKT), Dec. 2020, pp. 17–22. doi: 10.1109/IKT51791.2020.9345620.
D. Liestyowati, ‘Public Key Cryptography’, J. Phys. Conf. Ser., vol. 1477, no. 5, p. 052062, Mar. 2020, doi: 10.1088/1742-6596/1477/5/052062.
K. Munjal and R. Bhatia, ‘A systematic review of homomorphic encryption and its contributions in healthcare industry’, Complex Intell. Syst., May 2022, doi: 10.1007/s40747-022-00756-z.
J. Liu, Y. Tian, Y. Zhou, Y. Xiao, and N. Ansari, ‘Privacy preserving distributed data mining based on secure multi-party computation’, Comput. Commun., vol. 153, pp. 208–216, Mar. 2020, doi: 10.1016/j.comcom.2020.02.014.
N. Patel and S. Patel, ‘A Study on Data Perturbation Techniques in Privacy Preserving Data Mining’, vol. 02, no. 09, p. 6.
A. Shah and R. Gulati, ‘Evaluating applicability of perturbation techniques for privacy preserving data mining by descriptive statistics’, in 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Sep. 2016, pp. 607–613. doi: 10.1109/ICACCI.2016.7732113.
K. Chen and L. Liu, ‘Geometric data perturbation for privacy preserving outsourced data mining’, Knowl. Inf. Syst., vol. 29, no. 3, pp. 657–695, Dec. 2011, doi: 10.1007/s10115-010-0362-4.
A. Siddhpura and P. D. V. Vekariya, ‘An approach of Privacy Preserving Data mining using Perturbation & Cryptography Technique’, Int. J. Future Revolut. Comput. Sci. Commun. Eng., vol. 4, no. 4, Art. no. 4, Apr. 2018.
J. Vaidya, B. Shafiq, W. Fan, D. Mehmood, and D. Lorenzi, ‘A Random Decision Tree Framework for Privacy-Preserving Data Mining’, IEEE Trans. Dependable Secure Comput., vol. 11, no. 5, pp. 399–411, Sep. 2014, doi: 10.1109/TDSC.2013.43.
R. Kaur and M. Bansal, ‘Transformation approach for boolean attributes in privacy preserving data mining’, in 2015 1st International Conference on Next Generation Computing Technologies (NGCT), Sep. 2015, pp. 644–648. doi: 10.1109/NGCT.2015.7375200.
A. S. M. T. Hasan, Q. Jiang, J. Luo, C. Li, and L. Chen, ‘An effective value swapping method for privacy preserving data publishing: An effective value swapping method for privacy preserving data publishing’, Secur. Commun. Netw., vol. 9, Jul. 2016, doi: 10.1002/sec.1527.
K. Abrar Ahmed, Department of Computer Science and Engineering, Manonmaniam Sundaranar University, Chennai – 600017, Tamil Nadu, India, H. Abdul Rauf, and Sree Sastha Institute of Engineering and Technology, Chennai – 600113, Tamil Nadu, India, ‘Privacy Preserving Data using Fuzzy Hybrid Data Transformation Technique’, Indian J. Sci. Technol., vol. 10, no. 24, pp. 1–6, Jun. 2017, doi: 10.17485/ijst/2017/v10i24/114039.
G. Li and R. Xue, ‘A New Privacy-Preserving Data Mining Method Using Non-negative Matrix Factorization and Singular Value Decomposition’, Wirel. Pers. Commun., vol. 102, no. 2, pp. 1799–1808, Sep. 2018, doi: 10.1007/s11277-017-5237-5.
A. Kiran and D. D. Vasumathi, ‘Data Mining: Random Swapping based Data Perturbation Technique for Privacy Preserving in Data Mining’, DATA Min., vol. 8, no. 1, p. 15, 2019.
D. Vashi, H. B. Bhadka, K. Patel, and S. Garg, ‘An Efficient Hybrid Approach of Attribute Based Encryption For Privacy Preserving Through Horizontally Partitioned Data’, Procedia Comput. Sci., vol. 167, pp. 2437–2444, Jan. 2020, doi: 10.1016/j.procs.2020.03.296.
N. Kousika and K. Premalatha, ‘An improved privacy-preserving data mining technique using singular value decomposition with three-dimensional rotation data perturbation’, J. Supercomput., vol. 77, no. 9, pp. 10003–10011, Sep. 2021, doi: 10.1007/s11227-021-03643-5.
T. Jahan, G. R. Reddy, K. Shekhar, and M. Swapna, ‘Novel hybrid geometric data perturbation technique by means of sampling data intervals’, Mater. Today Proc., Jul. 2021, doi: 10.1016/j.matpr.2021.06.420.
S. A. Abdelhameed, S. M. Moussa, N. L. Badr, and M. Essam Khalifa, ‘The Generic Framework of Privacy Preserving Data Mining Phases: Challenges & Future Directions’, in 2021 Tenth International Conference on Intelligent Computing and Information Systems (ICICIS), Dec. 2021, pp. 341–347. doi: 10.1109/ICICIS52592.2021.9694174.